OpenMPI Job Submission Example

This page provides an example of submitting a simple MPI job to the cluster using the OpenMPI MPI library.J It is based on the Basic_MPI_Job a.k.a. Basic_OpenMPI a.k.a. HelloUMD-MPI_gcc_openmpi job template in the OnDemand portal.

This job makes use of a simple Hello World! program called hello-umd available in the UMD HPC cluster software library and which supports sequential, multithreaded, and MPI modes of operation. The code simply prints an identifying message from each thread of each task --- for this pure MPI case each task consists of a single thread, so it will print a message from each MPI task.

Overview

This example basically consists of a single file, the job script submit.sh (see for a listing and explanation of the script) which gets submitted to the cluster via the sbatch command.

The script is designed to show many good practices; including:

  • setting standard sbatch options within the script
  • loading the needed modules within the script
  • printing some useful diagnostic information at start of the script
  • creating a job specific work directory
  • running the code and saving the exit code
  • exiting with the exit code from the main application

Many of the practices above are rather overkill for such a simple job --- indeed, the vast majority of lines are for these "good practices" rather than the running of the intended code, but are included for educational purposes.

This code runs hello-umd in sequential mode, saving the output to a file in the temporary work directory, and then copying back to the submission directory. We could have forgone all that and simply have the output of hello-umd go to standard output, which would be available in the slurm-JOBNUMBER.out file (or whatever file you instructed Slurm to use instead). Doing such is acceptable as long as the code is not producing an excessive amount (many MBs) of output --- if the code produces a lot of output having it all sent to Slurm output file can cause problems, and it is better to redirect to a file.

The submission script

The submission script submit.sh can be downloaded as plain text. We present a copy with line numbers for discussion below (click on lines to link to discussion for those lines):

Source of submit.sh

HelloUMD-Sequential job submission script
Line# Code
1
#!/bin/bash
2
# The line above this is the "shebang" line.  It must be first line in script
3
#-----------------------------------------------------
4
#	OnDemand Job Template for Hello-UMD, MPI version
5
#	Runs a simple MPI enabled hello-world code
6
#-----------------------------------------------------
7
#
8
# Slurm sbatch parameters section:
9
#	Request 60 MPI tasks with 1 CPU core each
10
#SBATCH -n 60
11
#SBATCH -c 1
12
#	Request 5 minutes of walltime
13
#SBATCH -t 5
14
#	Request 1 GB of memory per CPU core
15
#SBATCH --mem-per-cpu=1024
16
#	Do not allow other jobs to run on same node
17
#SBATCH --exclusive
18
#	Run on debug partition for rapid turnaround.  You will need
19
#	to change this (remove the line) if walltime > 15 minutes
20
#SBATCH --partition=debug
21
#       Do not inherit the environment of the process running the
22
#       sbatch command.  This requires you to explicitly set up the
23
#       environment for the job in this script, improving reproducibility
24
#SBATCH --export=NONE
25
#
26
27
# This job will run the MPI enabled version of hello-umd
28
# We create a directory on parallel filesystem from where we actually 
29
# will run the job.
30
31
# Section to ensure we have the "module" command defined
32
unalias tap >& /dev/null
33
if [ -f ~/.bash_profile ]; then
34
	source ~/.bash_profile
35
elif [ -f ~/.profile ]; then
36
	source ~/.profile
37
fi
38
39
# Set SLURM_EXPORT_ENV to ALL.  This prevents the --export=NONE flag
40
# from being passed to mpirun/srun/etc, which can cause issues.
41
# We want the environment of the job script to be passed to all 
42
# tasks/processes of the job
43
export SLURM_EXPORT_ENV=ALL
44
45
# Module load section
46
# First clear our module list 
47
module purge
48
# and reload the standard modules
49
module load hpcc/deepthought2
50
# Load the desired compiler,  MPI, and package modules
51
# NOTE: You need to use the same compiler and MPI module used
52
# when compiling the MPI-enabled code you wish to run (in this
53
# case hello-umd).  The values # listed below are correct for the
54
# version of hello-umd we will be using, but you may need to 
55
# change them if you wish to run a different package.
56
module load gcc/8.4.0
57
module load openmpi/3.1.5
58
module load hello-umd/1.5
59
60
# Section to make a scratch directory for this job
61
# Because different MPI tasks, which might be on different nodes, and will need
62
# access to it, we put it in a parallel file system.  
63
# We include the SLURM jobid in the directory name to avoid interference 
64
# if multiple jobs running at same time.
65
TMPWORKDIR="/lustre/$USER/ood-job.${SLURM_JOBID}"
66
mkdir $TMPWORKDIR
67
cd $TMPWORKDIR
68
69
# Section to output information identifying the job, etc.
70
echo "Slurm job ${SLURM_JOBID} running on"
71
hostname
72
echo "To run on ${SLURM_NTASKS} CPU cores across ${SLURM_JOB_NUM_NODES} nodes"
73
echo "All nodes: ${SLURM_JOB_NODELIST}"
74
date
75
pwd
76
echo "Loaded modules are:"
77
module list
78
echo "Job will be started out of $TMPWORKDIR"
79
80
81
# Setting this variable will suppress the warnings
82
# about lack of CUDA support on non-GPU enabled nodes.  We
83
# are not using CUDA, so warning is harmless.
84
export OMPI_MCA_mpi_cuda_support=0
85
86
# Get the full path to our hello-umd executable.  It is best
87
# to provide the full path of our executable to mpirun, etc.
88
MYEXE=`which hello-umd`
89
echo "Using executable $MYEXE"
90
91
# Run our script using mpirun
92
# We do not specify the number of tasks here, and instead rely on
93
# it defaulting to the number of tasks requested of Slurm
94
mpirun  ${MYEXE}  > hello.out 2>&1
95
# Save the exit code from the previous command
96
ECODE=$?
97
98
# Output from the above command was placed in a work directory in a parallel
99
# filesystem.  That parallel filesystem does _not_ get cleaned up automatically.
100
# And it is not normally visible from the Job Composer.
101
# To deal with this, we make a symlink from the job submit directory to
102
# the work directory for the job.  
103
#
104
# NOTE: The work directory will continue to exist until you delete it.  It will
105
# not get deleted when you delete the job in Job Composer.
106
107
ln -s ${TMPWORKDIR} ${SLURM_SUBMIT_DIR}/work-dir
108
109
echo "Job finished with exit code $ECODE.  Work dir is $TMPWORKDIR"
110
date
111
112
# Exit with the cached exit code
113
exit $ECODE

Discussion of submit.sh

Running the example

The easiest way to run this example is with the Job Composer of the OnDemand portal, using the HelloUMD-Sequential template.

To submit from the command line, just

  1. Download the submit.sh script to the HPC login node.
  2. Run the command sbatch submit.sh. This will submit the job to the scheduler, and should return a message like Submitted batch job 23767 --- the number will vary (and is the job number for this job). The job number can be used to reference the job in Slurm, etc. (Please always give the job number(s) when requesting help about a job you submitted).

Whichever method you used for submission, the job will be queued for the debug partition and should run within 15 minutes or so. When it finishes running, the slurm-JOBNUMBER.out should contain the output from our diagnostic commands (time the job started, finished, module list, etc). The output of the hello-umd will be in the file hello.out in the job specific work directory created in your lustre directory. For the convenience of users of the OnDemand portal, a symlink to this directory is created in the submission directory. So if you used OnDemand, a symlink to the work directory will appear in the Folder contents section on the right.