$PBS_NODEFILE
which contains the name of a file that
lists all of the nodes that you've been assigned.
Although the cluster is primarily designed for the submission of parallel jobs, there are occasions wherein one wishes to submit jobs which use only a single CPU core. This can be for embarrassingly parallel jobs or for high-throughput computing tasks.
OpenMPI is the preferred MPI unless your application specifically requires one of the alternate MPI variants. Slurm and OpenMPI interact well together, so it makes it very easy to use. OpenMPI is also compiled to support all of the various interconnect hardware, so for nodes with fast transport (e.g. InfiniBand) the fastest interface will be selected automatically.
The following example will run the MPI executable alltoall
on
a total of 40 cores.
For further information
on the module load
command check out the section
Setting Up Your Environment.
#!/bin/tcsh
#SBATCH --ntasks=40
#SBATCH -t 00:01:00
#SBATCH --mem-per-cpu=2048
#SBATCH --exclusive
module unload intel
#It is recommended that you add the exact version of the
#compiler and MPI library used when you compiled the code
#to improve long-term reproducibility
module load gcc
module load openmpi
mpirun alltoall
The above is for cshell style shells. The bourne style shell version is similar:
#!/bin/bash
#SBATCH --ntasks=40
#SBATCH -t 00:01:00
#SBATCH --mem-per-cpu=2048
#SBATCH --exclusive
. ~/.bashrc
module unload intel
#It is recommended that you add the exact version of the
#compiler and MPI library used when you compiled the code
#to improve long-term reproducibility
module load gcc
module load openmpi
mpirun alltoall
|
NOTE: Note the addition of the
. ~/.bashrc
line. This is necessary on the Deepthought2 cluster if your default shell is not bash, as otherwise
the dot files (and the definition of the module and tap commands) will
NOT get loaded. On the Juggernaut cluster if your default shell is not bash,
use . ~/.bash_profile
|
OpenMPI is the preferred MPI unless your application specifically requires one of the alternate MPI variants. Slurm and OpenMPI interact well together, so it makes it very easy to use. OpenMPI is also compiled to support all of the various interconnect hardware, so for nodes with fast transport (e.g. InfiniBand) the fastest interface will be selected automatically.
The following example will run the MPI executable alltoall
on
a total of 40 cores.
For further information
on the module load
command check out the section
Setting Up Your Environment.
#!/bin/tcsh
#SBATCH --ntasks=40
#SBATCH -t 00:01:00
#SBATCH --mem-per-cpu=2048
#SBATCH --exclusive
module unload intel
#It is recommended that you add the exact version of the
#compiler and MPI library used when you compiled the code
#to improve long-term reproducibility
module load gcc
module load openmpi
mpirun alltoall
The above is for cshell style shells. The bourne style shell version is similar:
#!/bin/bash
#SBATCH --ntasks=40
#SBATCH -t 00:01:00
#SBATCH --mem-per-cpu=2048
#SBATCH --exclusive
. ~/.bashrc
module unload intel
#It is recommended that you add the exact version of the
#compiler and MPI library used when you compiled the code
#to improve long-term reproducibility
module load gcc
module load openmpi
mpirun alltoall
|
NOTE: Note the addition of the
. ~/.bashrc
line. This is necessary on the Deepthought2 cluster if your default shell is not bash, as otherwise
the dot files (and the definition of the module and tap commands) will
NOT get loaded. On the Juggernaut cluster if your default shell is not bash,
use . ~/.bash_profile
|
The Intel MPI libraries are available if you compiled your code with the Intel compilers. Slurm and the Intel MPI libraries interact well together, so it makes it very easy to use.
The following example will run the MPI executable alltoall
on
a total of 40 cores.
For further information
on the module load
command check out the section
Setting Up Your Environment.
#!/bin/tcsh
#SBATCH --ntasks=40
#SBATCH -t 00:01:00
#SBATCH --mem-per-cpu=2048
#SBATCH --exclusive
module load intel
mpirun alltoall
The above is for cshell style shells. The bourne style shell version is similar:
#!/bin/bash
#SBATCH --ntasks=40
#SBATCH -t 00:01:00
#SBATCH --mem-per-cpu=2048
#SBATCH --exclusive
. ~/.bashrc
module load intel
mpirun alltoall
|
NOTE: Note the addition of the
. ~/.bashrc
line. This is necessary on the Deepthought2 cluster if your default shell is not bash, as otherwise
the dot files (and the definition of the module and tap commands) will
NOT get loaded. On the Juggernaut cluster if your default shell is not bash,
use . ~/.bash_profile
|
|
Use of the LAM MPI libraries is no longer supported on the Deepthought
HPC clusters. Please use
either the latest OpenMPI or Intel MPI libraries instead.
|
|
The LAM MPI library function which parses the host string from Slurm
appears to be broken. As the LAM MPI libraries are no longer maintained
by the authors, it cannot be fixed by upgrading. The following provides
a workaround, but it is STRONGLY recommended that you move to another
MPI library.
|
The following example will run the MPI executable alltoall
on
a total of 40 cores.
For further information
on the tap
command check out the section
Setting Up Your Environment.
#!/bin/tcsh
#SBATCH -t 00:01:00
#SBATCH --ntasks=40
#SBATCH --mem-per-cpu=2048
#SBATCH --exclusive
#Generate a PBS_NODEFILE format nodefile
set PBS_NODEFILE=`/usr/local/slurm/bin/generate_pbs_nodefile`
#and convert it to LAM's desired format
set MPI_NODEFILE=$WORKDIR/mpd_nodes.${SLURM_JOBID}
sort $PBS_NODEFILE | uniq -c | awk '{ printf("%s cpu=%s\n", $2, $1); }' > $MPI_NODEFILE
tap lam-gnu
lamboot $MPI_NODEFILE
mpirun -np $SLURM_NTASKS C alltoall
lamclean
lamhalt
|
Use of the MPICH MPI libraries is no longer supported on the Deepthought
HPC clusters. Please use
either the latest OpenMPI or Intel MPI libraries instead.
|
The following example will run the MPI executable alltoall
on
a total of 40 cores.
For further information on the
tap
command check out the section
Setting Up Your Environment.
Note also that if you've never run MPICH before, you'll need to create the file .mpd.conf in your home directory. This file should contain at least a line of the form MPD_SECRETWORD=we23jfn82933. (DO NOT use the example provided, make up your own secret word.)
#!/bin/tcsh
#SBATCH -t 1:00
#SBATCH --ntask=40
#SBATCH --mem-per-cpu=2048
#SBATCH --exclusive
tap mpich-gnu
#Generate a PBS_NODEFILE format nodefile
set PBS_NODEFILE=`/usr/local/slurm/bin/generate_pbs_nodefile`
#and convert it to MPICH's desired format
set MPI_NODEFILE=/tmp/mpd_nodes.${SLURM_JOBID}
sort $PBS_NODEFILE | uniq -c | awk '{ printf("%s:%s\n", $2, $1); }' > $MPI_NODEFILE
mpdboot -n $SLURM_JOB_NUM_NODES -f $MPI_NODEFILE
mpiexec -n $SLURM_NTASKS alltoall
mpdallexit
The above assumes a csh-like shell. For bourne shell/bash users, the equivalent script would be
#!/bin/bash
#SBATCH -t 1:00
#SBATCH --ntask=40
#SBATCH --mem-per-cpu=2048
#SBATCH --exclusive
. ~/.bashrc
SHELL=bash
tap mpich-gnu
#Generate a PBS_NODEFILE format nodefile
PBS_NODEFILE=`/usr/local/slurm/bin/generate_pbs_nodefile`
#and convert it to MPICH's desired format
MPI_NODEFILE=/tmp/mpd_nodes.${SLURM_JOBID}
sort $PBS_NODEFILE | uniq -c | awk '{ printf("%s:%s\n", $2, $1); }' > $MPI_NODEFILE
mpdboot -n $SLURM_JOB_NUM_NODES -f $MPI_NODEFILE
mpiexec -n $SLURM_NTASKS alltoall
mpdallexit
|
NOTE: Note the addition of the
. ~/.bashrc
line. This is necessary on the Deepthought2 cluster if your default shell is not bash, as otherwise
the dot files (and the definition of the module and tap commands) will
NOT get loaded. On the Juggernaut cluster if your default shell is not bash,
use . ~/.bash_profile
|
The following example will run a command on each of the nodes in the assigned list. It uses ssh to communicate between nodes. If your shell is csh/tcsh, use this:
#!/bin/tcsh
#SBATCH -ntasks=40
#SBATCH -t 00:01:00
#SBATCH --mem-per-cpu=2048
#SBATCH --exclusive
set COMMAND=/bin/hostname
#We want a list of nodes, one per line.
#Use the PBS compatibility wrapper to make a PBS style nodefile
set PBS_NODEFILE=`/usr/local/slurm/bin/generate_pbs_nodefile`
foreach node (`cat $PBS_NODEFILE`)
ssh $node $COMMAND &
end
wait
Here we "cheated" and used the Slurm PBS compatibility wrapper script to convert Slurm's abbreviated list of nodes into an PBS-like nodes file which we then use to launch our ssh tasks.
Note the use of the ampersand &
in the ssh
, and
the wait
command at the end of the loop. The ampersand
causes the processes to run in parallel (otherwise each invocation
of $COMMAND
via ssh would need to complete before the next
one starts). The wait
command is necessary to prevent the
main script from exiting before all of the spawned ssh
processes have completed.
The above example has issues with accurately reporting the exit
code of each of spawned commands. This could probably be implemented
in the bash version below, but would significantly complicate the script.
And I do not believe it is even possible given the limitations of the
wait
command in the C-shell variants.
Note: In order for this to work, you need to setup up passwordless ssh among the HPC compute nodes.
And if you prefer bash, use this:
#!/bin/bash
#SBATCH -ntasks=40
#SBATCH -t 00:01:00
#SBATCH --mem-per-cpu=2048
#SBATCH --exclusive
COMMAND=/bin/hostname
#We want a list of nodes, one per line.
#Use the PBS compatibility wrapper to make a PBS style nodefile
PBS_NODEFILE=`/usr/local/slurm/bin/generate_pbs_nodefile`
for node in `cat $PBS_NODEFILE`; do
ssh $node $COMMAND &
done
wait
|
NOTE: Note the addition of the
. ~/.bashrc
line. This is necessary on the Deepthought2 cluster if your default shell is not bash, as otherwise
the dot files (and the definition of the module and tap commands) will
NOT get loaded. On the Juggernaut cluster if your default shell is not bash,
use . ~/.bash_profile
|
The above examples are general enough to handle tasks running on
different nodes. If you know (because of the number of cores requested
relative to the smallest number of cores available on a node, or because
of the way you requested the cores) that all the cores will be on the
same node, you can forgo the ssh
part and just have the
main script invoke the command on the current node. E.g., for csh,
#!/bin/tcsh
#SBATCH -ntasks=8
#SBATCH -t 00:01:00
#SBATCH --mem-per-cpu=2048
#SBATCH --shared
set COMMAND=/bin/hostname
#We want a list of nodes, one per line.
#Use the PBS compatibility wrapper to make a PBS style nodefile
set PBS_NODEFILE=`/usr/local/slurm/bin/generate_pbs_nodefile`
foreach node (`cat $PBS_NODEFILE`)
#In this case, it is assumed we *know* that all the assigned
#cores are on the same node.
$COMMAND &
end
wait
If you have any doubts, however, the general, multinode capable version
is better. It will handle the case when all cores are on the same node, or
when they are divided across multiple nodes, and the penalty for the extra
ssh
is usually negligible.
NOTE: all of the above are simplistic cases for
example purposes. Your code still needs to somehow implement communication
between the tasks, which is the main raison d'être for the
MPI standard. If your code does not need communication between the threads,
than it is by definition
embarrassingly
parallel and should be submitted as N distinct jobs rather
than a single job with N tasks.
Back to Top