OpenMPI Job Submission Example

This page provides an example of submitting a simple MPI job to the cluster using the OpenMPI MPI library.J It is based on the Basic_MPI_Job a.k.a. Basic_OpenMPI a.k.a. HelloUMD-MPI_gcc_openmpi job template in the OnDemand portal.

This job makes use of a simple Hello World! program called hello-umd available in the UMD HPC cluster software library and which supports sequential, multithreaded, and MPI modes of operation. The code simply prints an identifying message from each thread of each task --- for this pure MPI case each task consists of a single thread, so it will print a message from each MPI task.

Overview

This example basically consists of a single file, the job script submit.sh (see for a listing and explanation of the script) which gets submitted to the cluster via the sbatch command.

The script is designed to show many good practices; including:

setting standard sbatch options within the script
loading the needed modules within the script
printing some useful diagnostic information at start of the script
creating a job specific work directory
running the code and saving the exit code
exiting with the exit code from the main application

Many of the practices above are rather overkill for such a simple job --- indeed, the vast majority of lines are for these "good practices" rather than the running of the intended code, but are included for educational purposes.

This code runs hello-umd in sequential mode, saving the output to a file in the temporary work directory, and then copying back to the submission directory. We could have forgone all that and simply have the output of hello-umd go to standard output, which would be available in the slurm-JOBNUMBER.out file (or whatever file you instructed Slurm to use instead). Doing such is acceptable as long as the code is not producing an excessive amount (many MBs) of output --- if the code produces a lot of output having it all sent to Slurm output file can cause problems, and it is better to redirect to a file.

The submission script

The submission script submit.sh can be downloaded as plain text. We present a copy with line numbers for discussion below (click on lines to link to discussion for those lines):

Source of submit.sh


#!/bin/bash
# The line above this is the "shebang" line.  It must be first line in script
#-----------------------------------------------------
#	OnDemand Job Template for Hello-UMD, MPI version
#	Runs a simple MPI enabled hello-world code
#-----------------------------------------------------
#
# Slurm sbatch parameters section:
#	Request 60 MPI tasks with 1 CPU core each
#SBATCH -n 60
#SBATCH -c 1
#	Request 5 minutes of walltime
#SBATCH -t 5
#	Request 1 GB of memory per CPU core
#SBATCH --mem-per-cpu=1024
#	Do not allow other jobs to run on same node
#SBATCH --exclusive
#	Run on debug partition for rapid turnaround.  You will need
#	to change this (remove the line) if walltime > 15 minutes
#SBATCH --partition=debug
#       Do not inherit the environment of the process running the
#       sbatch command.  This requires you to explicitly set up the
#       environment for the job in this script, improving reproducibility
#SBATCH --export=NONE
#

# This job will run the MPI enabled version of hello-umd
# We create a directory on parallel filesystem from where we actually 
# will run the job.

# Section to ensure we have the "module" command defined
unalias tap >& /dev/null
if [ -f ~/.bash_profile ]; then
	source ~/.bash_profile
elif [ -f ~/.profile ]; then
	source ~/.profile
fi

# Set SLURM_EXPORT_ENV to ALL.  This prevents the --export=NONE flag
# from being passed to mpirun/srun/etc, which can cause issues.
# We want the environment of the job script to be passed to all 
# tasks/processes of the job
export SLURM_EXPORT_ENV=ALL

# Module load section
# First clear our module list 
module purge
# and reload the standard modules
module load hpcc/deepthought2
# Load the desired compiler,  MPI, and package modules
# NOTE: You need to use the same compiler and MPI module used
# when compiling the MPI-enabled code you wish to run (in this
# case hello-umd).  The values # listed below are correct for the
# version of hello-umd we will be using, but you may need to 
# change them if you wish to run a different package.
module load gcc/8.4.0
module load openmpi/3.1.5
module load hello-umd/1.5

# Section to make a scratch directory for this job
# Because different MPI tasks, which might be on different nodes, and will need
# access to it, we put it in a parallel file system.  
# We include the SLURM jobid in the directory name to avoid interference 
# if multiple jobs running at same time.
TMPWORKDIR="/lustre/$USER/ood-job.${SLURM_JOBID}"
mkdir $TMPWORKDIR
cd $TMPWORKDIR

# Section to output information identifying the job, etc.
echo "Slurm job ${SLURM_JOBID} running on"
hostname
echo "To run on ${SLURM_NTASKS} CPU cores across ${SLURM_JOB_NUM_NODES} nodes"
echo "All nodes: ${SLURM_JOB_NODELIST}"
date
pwd
echo "Loaded modules are:"
module list
echo "Job will be started out of $TMPWORKDIR"


# Setting this variable will suppress the warnings
# about lack of CUDA support on non-GPU enabled nodes.  We
# are not using CUDA, so warning is harmless.
export OMPI_MCA_mpi_cuda_support=0

# Get the full path to our hello-umd executable.  It is best
# to provide the full path of our executable to mpirun, etc.
MYEXE=`which hello-umd`
echo "Using executable $MYEXE"

# Run our script using mpirun
# We do not specify the number of tasks here, and instead rely on
# it defaulting to the number of tasks requested of Slurm
mpirun  ${MYEXE}  > hello.out 2>&1
# Save the exit code from the previous command
ECODE=$?

# Output from the above command was placed in a work directory in a parallel
# filesystem.  That parallel filesystem does _not_ get cleaned up automatically.
# And it is not normally visible from the Job Composer.
# To deal with this, we make a symlink from the job submit directory to
# the work directory for the job.  
#
# NOTE: The work directory will continue to exist until you delete it.  It will
# not get deleted when you delete the job in Job Composer.

ln -s ${TMPWORKDIR} ${SLURM_SUBMIT_DIR}/work-dir

echo "Job finished with exit code $ECODE.  Work dir is $TMPWORKDIR"
date

# Exit with the cached exit code
exit $ECODE

HelloUMD-Sequential job submission script
Line#	Code
`1`	`#!/bin/bash`
`2`	`# The line above this is the "shebang" line. It must be first line in script`
`3`	`#-----------------------------------------------------`
`4`	`# OnDemand Job Template for Hello-UMD, MPI version`
`5`	`# Runs a simple MPI enabled hello-world code`
`6`	`#-----------------------------------------------------`
`7`	`#`
`8`	`# Slurm sbatch parameters section:`
`9`	`# Request 60 MPI tasks with 1 CPU core each`
`10`	`#SBATCH -n 60`
`11`	`#SBATCH -c 1`
`12`	`# Request 5 minutes of walltime`
`13`	`#SBATCH -t 5`
`14`	`# Request 1 GB of memory per CPU core`
`15`	`#SBATCH --mem-per-cpu=1024`
`16`	`# Do not allow other jobs to run on same node`
`17`	`#SBATCH --exclusive`
`18`	`# Run on debug partition for rapid turnaround. You will need`
`19`	`# to change this (remove the line) if walltime > 15 minutes`
`20`	`#SBATCH --partition=debug`
`21`	`# Do not inherit the environment of the process running the`
`22`	`# sbatch command. This requires you to explicitly set up the`
`23`	`# environment for the job in this script, improving reproducibility`
`24`	`#SBATCH --export=NONE`
`25`	`#`
`26`
`27`	`# This job will run the MPI enabled version of hello-umd`
`28`	`# We create a directory on parallel filesystem from where we actually`
`29`	`# will run the job.`
`30`
`31`	`# Section to ensure we have the "module" command defined`
`32`	`unalias tap >& /dev/null`
`33`	`if [ -f ~/.bash_profile ]; then`
`34`	`source ~/.bash_profile`
`35`	`elif [ -f ~/.profile ]; then`
`36`	`source ~/.profile`
`37`	`fi`
`38`
`39`	`# Set SLURM_EXPORT_ENV to ALL. This prevents the --export=NONE flag`
`40`	`# from being passed to mpirun/srun/etc, which can cause issues.`
`41`	`# We want the environment of the job script to be passed to all`
`42`	`# tasks/processes of the job`
`43`	`export SLURM_EXPORT_ENV=ALL`
`44`
`45`	`# Module load section`
`46`	`# First clear our module list`
`47`	`module purge`
`48`	`# and reload the standard modules`
`49`	`module load hpcc/deepthought2`
`50`	`# Load the desired compiler, MPI, and package modules`
`51`	`# NOTE: You need to use the same compiler and MPI module used`
`52`	`# when compiling the MPI-enabled code you wish to run (in this`
`53`	`# case hello-umd). The values # listed below are correct for the`
`54`	`# version of hello-umd we will be using, but you may need to`
`55`	`# change them if you wish to run a different package.`
`56`	`module load gcc/8.4.0`
`57`	`module load openmpi/3.1.5`
`58`	`module load hello-umd/1.5`
`59`
`60`	`# Section to make a scratch directory for this job`
`61`	`# Because different MPI tasks, which might be on different nodes, and will need`
`62`	`# access to it, we put it in a parallel file system.`
`63`	`# We include the SLURM jobid in the directory name to avoid interference`
`64`	`# if multiple jobs running at same time.`
`65`	`TMPWORKDIR="/lustre/$USER/ood-job.${SLURM_JOBID}"`
`66`	`mkdir $TMPWORKDIR`
`67`	`cd $TMPWORKDIR`
`68`
`69`	`# Section to output information identifying the job, etc.`
`70`	`echo "Slurm job ${SLURM_JOBID} running on"`
`71`	`hostname`
`72`	`echo "To run on ${SLURM_NTASKS} CPU cores across ${SLURM_JOB_NUM_NODES} nodes"`
`73`	`echo "All nodes: ${SLURM_JOB_NODELIST}"`
`74`	`date`
`75`	`pwd`
`76`	`echo "Loaded modules are:"`
`77`	`module list`
`78`	`echo "Job will be started out of $TMPWORKDIR"`
`79`
`80`
`81`	`# Setting this variable will suppress the warnings`
`82`	`# about lack of CUDA support on non-GPU enabled nodes. We`
`83`	`# are not using CUDA, so warning is harmless.`
`84`	`export OMPI_MCA_mpi_cuda_support=0`
`85`
`86`	`# Get the full path to our hello-umd executable. It is best`
`87`	`# to provide the full path of our executable to mpirun, etc.`
`88`	MYEXE=`which hello-umd`
`89`	`echo "Using executable $MYEXE"`
`90`
`91`	`# Run our script using mpirun`
`92`	`# We do not specify the number of tasks here, and instead rely on`
`93`	`# it defaulting to the number of tasks requested of Slurm`
`94`	`mpirun ${MYEXE} > hello.out 2>&1`
`95`	`# Save the exit code from the previous command`
`96`	`ECODE=$?`
`97`
`98`	`# Output from the above command was placed in a work directory in a parallel`
`99`	`# filesystem. That parallel filesystem does _not_ get cleaned up automatically.`
`100`	`# And it is not normally visible from the Job Composer.`
`101`	`# To deal with this, we make a symlink from the job submit directory to`
`102`	`# the work directory for the job.`
`103`	`#`
`104`	`# NOTE: The work directory will continue to exist until you delete it. It will`
`105`	`# not get deleted when you delete the job in Job Composer.`
`106`
`107`	`ln -s ${TMPWORKDIR} ${SLURM_SUBMIT_DIR}/work-dir`
`108`
`109`	`echo "Job finished with exit code $ECODE. Work dir is $TMPWORKDIR"`
`110`	`date`
`111`
`112`	`# Exit with the cached exit code`
`113`	`exit $ECODE`

Discussion of submit.sh

Running the example

The easiest way to run this example is with the Job Composer of the OnDemand portal, using the HelloUMD-Sequential template.

To submit from the command line, just

Download the submit.sh script to the HPC login node.
Run the command sbatch submit.sh. This will submit the job to the scheduler, and should return a message like Submitted batch job 23767 --- the number will vary (and is the job number for this job). The job number can be used to reference the job in Slurm, etc. (Please always give the job number(s) when requesting help about a job you submitted).

Whichever method you used for submission, the job will be queued for the debug partition and should run within 15 minutes or so. When it finishes running, the slurm-JOBNUMBER.out should contain the output from our diagnostic commands (time the job started, finished, module list, etc). The output of the hello-umd will be in the file hello.out in the job specific work directory created in your lustre directory. For the convenience of users of the OnDemand portal, a symlink to this directory is created in the submission directory. So if you used OnDemand, a symlink to the work directory will appear in the Folder contents section on the right.