When a job scheduled by Slurm starts, it needs to know certain
things about how it was scheduled, etc. E.g., what is it's working
directory, or what nodes were allocated for it. Slurm passes this
information to the job via environmental variables. In addition to
being available to your job, these are also used by programs like
mpirun
to default values. This way, something like
mpirun
already knows how many tasks to start and on which
nodes, without you needing to pass this information explicitly.
The following is a list of commonly used variables that are set by Slurm for each job, along with a brief description, sample value, and the nearest analog for PBS/Torque based schedulers. A full list of the variables set by Slurm for each job is available in the sbatch man page.
Slurm Job Environment Variables | |||
---|---|---|---|
Slurm Variable Name | Description | Example values | PBS/Torque analog |
$SLURM_CPUS_ON_NODE | Number of cores/node | 8,3 | $PBS_NUM_PPN |
$SLURM_CPUS_PER_TASK | Number of cores per task. I.e., the value given
to the -cpus-per-task or -c
sbatch options. Not set unless one of those options
given. |
8,3 | $PBS_NUM_PPN |
$SLURM_JOB_ID | Job ID | 5741192 | $PBS_JOBID |
$SLURM_JOBID | Deprecated. Same as SLURM_JOB_ID | ||
$SLURM_JOB_NAME | Job Name | myjob | $PBS_JOBNAME |
$SLURM_JOB_NODELIST | Nodes assigned to job | compute-b24-[1-3,5-9],compute-b25-[1,4,8] | cat $PBS_NODEFILE |
$SLURM_JOB_NUM_NODES | Number of nodes allocated to job | 2 | $PBS_NUM_NODES |
$SLURM_LOCALID | Index to core running on within node |
4 | |
$SLURM_NODEID | Index to node running on relative to nodes assigned to job |
0 | $PBS_O_NODENUM |
$SLURM_NNODES | Deprecated. Same as SLURM_JOB_NUM_NODES | 4 | cat $PBS_NODEFILE |
$SLURM_NODELIST | Deprecated. Same as SLURM_JOB_NODELIST | compute-b24-[1-3,5-9],compute-b25-[1,4,8] | cat $PBS_NODEFILE |
$SLURM_NTASKS | Total number of cores for job??? | 11 | $PBS_NP |
$SLURM_PROCID | Index of task relative to job | 0 | $PBS_O_TASKNUM - 1 |
$SLURM_SUBMIT_DIR | Submit Directory | /lustre/payerle/work | $PBS_O_WORKDIR |
$SLURM_SUBMIT_HOST | Host submitted from | login-1.deepthought2.umd.edu | $PBS_O_HOST |
$SLURM_TASKS_PER_NODE | This gives a comma-delimited list of integers representing the task per the node, using the same ordering as in SLURM_JOB_NODELIST. If consecutive nodes have the same task count, the integer will be followed by '(xN)', so the example value is for 2 tasks on the first three nodes and 1 task on the fourth node. | 2(x3),1 | $PBS_O_HOST |
The list of nodes allocated to a job is presented in a compact notation,
in which square brackets (i.e. [
and ]
) are
used to delimit lists and/or ranges of numeric values. This compact
form saves space in the environment and in displays, but is often
not the most useful in scripts, where a fully expanded list might
be more convenient.
To convert between the two formats, there are subcommands of the
scontrol
command, e.g.
#Example of using scontrol show hostnames, using example from above
login-2:~: scontrol show hostnames 'compute-b24-[1-3,5-9],compute-b25-[1,4,8]'
compute-b24-1
compute-b24-2
compute-b24-3
compute-b24-5
compute-b24-6
compute-b24-7
compute-b24-8
compute-b24-9
compute-b25-1
compute-b25-4
compute-b25-8
login-2:~:
#And now for the reverse
login-2:~: scontrol show hostlist 'compute-b24-1,compute-b24-2,compute-b24-3,compute-b24-5,compute-b24-6,compute-b24-7,compute-b24-8,compute-b24-9,compute-b25-1,compute-b25-4,compute-b25-8'
compute-b24-[1-3,5-9],compute-b25-[1,4,8]
login-2:~: