When a job scheduled by Slurm starts, it needs to know certain
things about how it was scheduled, etc. E.g., what is it's working
directory, or what nodes were allocated for it. Slurm passes this
information to the job via environmental variables. In addition to
being available to your job, these are also used by programs like
mpirun to default values. This way, something like
mpirun already knows how many tasks to start and on which
nodes, without you needing to pass this information explicitly.
The following is a list of commonly used variables that are set by Slurm for each job, along with a brief description, sample value, and the nearest analog for PBS/Torque based schedulers. A full list of the variables set by Slurm for each job is available in the sbatch man page.
|Slurm Job Environment Variables|
|Slurm Variable Name||Description||Example values||PBS/Torque analog|
|$SLURM_CPUS_ON_NODE||Number of cores/node||8,3||$PBS_NUM_PPN|
|$SLURM_CPUS_PER_TASK||Number of cores per task. I.e., the value given
|$SLURM_JOBID||Deprecated. Same as SLURM_JOB_ID|
|$SLURM_JOB_NODELIST||Nodes assigned to job||compute-b24-[1-3,5-9],compute-b25-[1,4,8]||cat $PBS_NODEFILE|
|$SLURM_JOB_NUM_NODES||Number of nodes allocated to job||2||$PBS_NUM_NODES|
|$SLURM_LOCALID||Index to core running on
|$SLURM_NODEID||Index to node running on
relative to nodes assigned to job
|$SLURM_NNODES||Deprecated. Same as SLURM_JOB_NUM_NODES||4||cat $PBS_NODEFILE|
|$SLURM_NODELIST||Deprecated. Same as SLURM_JOB_NODELIST||compute-b24-[1-3,5-9],compute-b25-[1,4,8]||cat $PBS_NODEFILE|
|$SLURM_NTASKS||Total number of cores for job???||11||$PBS_NP|
|$SLURM_PROCID||Index of task relative to job||0||$PBS_O_TASKNUM - 1|
|$SLURM_SUBMIT_HOST||Host submitted from||login-1.deepthought2.umd.edu||$PBS_O_HOST|
|$SLURM_TASKS_PER_NODE||This gives a comma-delimited list of integers representing the task per the node, using the same ordering as in SLURM_JOB_NODELIST. If consecutive nodes have the same task count, the integer will be followed by '(xN)', so the example value is for 2 tasks on the first three nodes and 1 task on the fourth node.||2(x3),1||$PBS_O_HOST|
The list of nodes allocated to a job is presented in a compact notation,
in which square brackets (i.e.
used to delimit lists and/or ranges of numeric values. This compact
form saves space in the environment and in displays, but is often
not the most useful in scripts, where a fully expanded list might
be more convenient.
To convert between the two formats, there are subcommands of the
scontrol command, e.g.
#Example of using scontrol show hostnames, using example from above login-2:~: scontrol show hostnames 'compute-b24-[1-3,5-9],compute-b25-[1,4,8]' compute-b24-1 compute-b24-2 compute-b24-3 compute-b24-5 compute-b24-6 compute-b24-7 compute-b24-8 compute-b24-9 compute-b25-1 compute-b25-4 compute-b25-8 login-2:~: #And now for the reverse login-2:~: scontrol show hostlist 'compute-b24-1,compute-b24-2,compute-b24-3,compute-b24-5,compute-b24-6,compute-b24-7,compute-b24-8,compute-b24-9,compute-b25-1,compute-b25-4,compute-b25-8' compute-b24-[1-3,5-9],compute-b25-[1,4,8] login-2:~: