USDA SCINet Ceres HPC Configuration

NB: You will need an account to use the Ceres cluster to run the pipeline.

All nf-core pipelines have been successfully configured for use on the Ceres cluster at the United States Department of Agriculture (USDA) Agricultural Research Service (ARS) Scientific Computing Network (SCINet).

To use, run the pipeline with -profile ceres. The will download and launch the ceres.config which has been pre-configured with a setup suitable for the Ceres cluster. Using this profile will configure Nextflow to download all required software as Singularity images as they are required in the pipeline.

Before running the pipeline, you will need to load Singularity and Nextflow using the environment module system on Ceres. You can do this by issuing the command:

module load singularity/3.10.2
module load nextflow/22.04.3

File storage recommendations

All of the intermediate files required to run the pipeline will be stored in the work/ directory by default. As /project directories have a limited quota, it is recommended to store raw data in /project and use Nextflow’s -work-dir parameter to place intermediate files in a /90daydata directory instead, e.g.

nextflow run nf-core/mag -profile ceres,test -w /90daydata/shared/$USER/.nextflow/work

Storage in /90daydata does not count against your account, but it is deleted 90 days after creation, giving an effective short-term cache for intermediate files in case you wish to re-run a pipeline. All of the main output files will be saved to the directory you specify with --outdir.

NB: Nextflow will need to submit the jobs via SLURM to the HPC cluster and as such the commands above will have to be executed on one of the login nodes. If in doubt, contact VRSC.

Config file

See config file on GitHub

params {
    config_profile_description = 'USDA ARS SCINet Ceres Cluster profile'
    config_profile_contact = 'Thomas A. Christensen II (@MillironX)'
    config_profile_url = 'https://scinet.usda.gov/guide/ceres/'

    max_memory = 640.GB
    max_cpus = 36
    max_time = 60.d
}

singularity {
    enabled = true
    autoMounts = true
}

process {
    resourceLimits = [
        memory: 640.GB,
        cpus: 36,
        time: 60.d
    ]
    executor = 'slurm'
    scratch = true
    queue = {
        switch (task.memory) {
            case { it >= 216.GB }:
                switch (task.time) {
                    case { it >= 7.d }:
                        return 'longmem'
                    default:
                        return 'mem'
                }
            default:
                switch (task.time) {
                    case { it >= 21.d }:
                        return 'long60'
                    case { it >= 7.d }:
                        return 'long'
                    case { it >= 48.h }:
                        return 'medium'
                    default:
                        return 'short'
                }
        }
    }
}