Slurm Environmental Variables

When a job scheduled by Slurm starts, it needs to know certain things about how it was scheduled, etc. E.g., what is it's working directory, or what nodes were allocated for it. Slurm passes this information to the job via environmental variables. In addition to being available to your job, these are also used by programs like mpirun to default values. This way, something like mpirun already knows how many tasks to start and on which nodes, without you needing to pass this information explicitly.

The following is a list of commonly used variables that are set by Slurm for each job, along with a brief description, sample value, and the nearest analog for PBS/Torque based schedulers. A full list of the variables set by Slurm for each job is available in the sbatch man page.

Slurm Job Environment Variables
Slurm Variable Name Description Example values PBS/Torque analog
$SLURM_CPUS_ON_NODE Number of cores/node 8,3 $PBS_NUM_PPN
$SLURM_CPUS_PER_TASK Number of cores per task. I.e., the value given to the -cpus-per-task or -c sbatch options. Not set unless one of those options given. 8,3 $PBS_NUM_PPN
$SLURM_JOB_ID Job ID 5741192 $PBS_JOBID
$SLURM_JOBID Deprecated. Same as SLURM_JOB_ID    
$SLURM_JOB_NAME Job Name myjob $PBS_JOBNAME
$SLURM_JOB_NODELIST Nodes assigned to job compute-b24-[1-3,5-9],compute-b25-[1,4,8] cat $PBS_NODEFILE
$SLURM_JOB_NUM_NODES Number of nodes allocated to job 2 $PBS_NUM_NODES
$SLURM_LOCALID Index to core running on
within node
4
$SLURM_NODEID Index to node running on
relative to nodes assigned to job
0 $PBS_O_NODENUM
$SLURM_NNODES Deprecated. Same as SLURM_JOB_NUM_NODES 4 cat $PBS_NODEFILE
$SLURM_NODELIST Deprecated. Same as SLURM_JOB_NODELIST compute-b24-[1-3,5-9],compute-b25-[1,4,8] cat $PBS_NODEFILE
$SLURM_NTASKS Total number of cores for job??? 11 $PBS_NP
$SLURM_PROCID Index of task relative to job 0 $PBS_O_TASKNUM - 1
$SLURM_SUBMIT_DIR Submit Directory /lustre/payerle/work $PBS_O_WORKDIR
$SLURM_SUBMIT_HOST Host submitted from login-1.deepthought2.umd.edu $PBS_O_HOST
$SLURM_TASKS_PER_NODE This gives a comma-delimited list of integers representing the task per the node, using the same ordering as in SLURM_JOB_NODELIST. If consecutive nodes have the same task count, the integer will be followed by '(xN)', so the example value is for 2 tasks on the first three nodes and 1 task on the fourth node. 2(x3),1 $PBS_O_HOST

Scontrol and hostnames/hostlists

The list of nodes allocated to a job is presented in a compact notation, in which square brackets (i.e. [ and ]) are used to delimit lists and/or ranges of numeric values. This compact form saves space in the environment and in displays, but is often not the most useful in scripts, where a fully expanded list might be more convenient.

To convert between the two formats, there are subcommands of the scontrol command, e.g.

#Example of using scontrol show hostnames, using example from above
login-2:~: scontrol show hostnames 'compute-b24-[1-3,5-9],compute-b25-[1,4,8]'
compute-b24-1
compute-b24-2
compute-b24-3
compute-b24-5
compute-b24-6
compute-b24-7
compute-b24-8
compute-b24-9
compute-b25-1
compute-b25-4
compute-b25-8
login-2:~:
#And now for the reverse
login-2:~: scontrol show hostlist 'compute-b24-1,compute-b24-2,compute-b24-3,compute-b24-5,compute-b24-6,compute-b24-7,compute-b24-8,compute-b24-9,compute-b25-1,compute-b25-4,compute-b25-8'
compute-b24-[1-3,5-9],compute-b25-[1,4,8]
login-2:~: