Python MPI Job Submission Example

This page provides an example of submitting a simple MPI job using Python, and in particular the mpi4py Python package.

This job basically runs a simple MPI enabled Hello World! type script which basically just prints a single line from each task identifying its rank and the node it is running on.

We provide line-by-line descriptions of both the submission script as well of the python script.

The submission script

The submission script is similar to other MPI jobs , and can be downloaded here. We also present it here:

Basic_Python_MPI_Job job submission script
Line# Code
1
#!/bin/bash
2
# The line above this is the "shebang" line.  It must be first line in script
3
#-----------------------------------------------------
4
#	Open OnDemand Job Template
5
#	For a basic python job using MPI
6
#-----------------------------------------------------
7
#
8
# Slurm sbatch parameters section:
9
#	Request 60 tasks with one CPU core each
10
#SBATCH -n  60
11
#SBATCH -c 1
12
#	Request 5 minutes of walltime
13
#SBATCH -t 5
14
#	Request 1 GB of memory per CPU core
15
#SBATCH --mem-per-cpu=1024
16
#	Do not allow other jobs to run on same node
17
#SBATCH --exclusive
18
#	Run on debug partition for rapid turnaround.  You will need
19
#	to change this (remove the line) if walltime > 15 minutes
20
#SBATCH --partition=debug
21
#       Do not inherit the environment of the process running the
22
#       sbatch command.  This requires you to explicitly set up the
23
#       environment for the job in this script, improving reproducibility
24
#SBATCH --export=NONE
25
#
26
27
# This job will run the code in hello_mpi.py script from the submission dir
28
# We create a working directory on parallel file system and run the job
29
# from there, and then make a symlink to the working dir in the submission dir
30
31
# Section to ensure we have the "module" command defined
32
unalias tap >& /dev/null
33
if [ -f ~/.bash_profile ]; then
34
	source ~/.bash_profile
35
elif [ -f ~/.profile ]; then
36
	source ~/.profile
37
fi
38
39
# Set SLURM_EXPORT_ENV to ALL.  This prevents the --export=NONE flag
40
# from being passed to mpirun/srun/etc, which can cause issues.
41
# We want the environment of the job script to be passed to all 
42
# tasks/processes of the job
43
export SLURM_EXPORT_ENV=ALL
44
45
# Module load section
46
# First clear our module list 
47
module purge
48
# and reload the standard modules
49
module load hpcc/deepthought2
50
# Load the desired compiler, MPI, and python modules
51
# NOTE: You need to use the same compiler and MPI module used
52
# when compiling the python (and its mpi4py package).  The values
53
# listed below are correct, you may need to change them if you change
54
# the python version.
55
module load gcc/8.4.0
56
module load openmpi/3.1.5
57
module load python/3.7.7
58
59
# Section to make a scratch directory for this job
60
# Because different MPI tasks, which might be on different nodes, need
61
# access to it, we put it in lustre.  We include the SLURM jobid in the
62
# directory name to avoid interference if multiple jobs running at same time.
63
TMPWORKDIR="/lustre/$USER/ood-mpi4py.${SLURM_JOBID}"
64
mkdir $TMPWORKDIR
65
cd $TMPWORKDIR
66
67
# Section to output information identifying the job, etc.
68
echo "Slurm job ${SLURM_JOBID} running on"
69
hostname
70
echo "To run on ${SLURM_NTASKS} CPU cores across ${SLURM_JOB_NUM_NODES} nodes"
71
echo "All nodes: ${SLURM_JOB_NODELIST}"
72
date
73
pwd
74
echo "Loaded modules are:"
75
module list
76
77
78
# Setting this variable will suppress the warnings
79
# about lack of CUDA support on non-GPU enabled nodes.  We
80
# are not using CUDA, so warning is harmless.
81
export OMPI_MCA_mpi_cuda_support=0
82
83
# Get the full path to our python executable.  It is best
84
# to provide full path of our executable to , etc.
85
MYPYTHON=`which python`
86
echo "Using python $MYPYTHON"
87
88
# Run our script using mpirun
89
# We do not specify the number of tasks here, and instead rely on
90
# it defaulting to the number of tasks requested of Slurm
91
mpirun  ${MYPYTHON} ${SLURM_SUBMIT_DIR}/hello_mpi.py > hello.out 2>&1
92
# Save the exit code from the previous command
93
ECODE=$?
94
95
# Symlink our working dir back into submit dir
96
ln -s ${TMPWORKDIR} ${SLURM_SUBMIT_DIR}/work-dir
97
98
echo "Job finished with exit code $ECODE.  Working dir is $TMPWORKDIR"
99
date
100
101
# Exit with the cached exit code
102
exit $ECODE

The python script

The python script is somewhat simple (as far as MPI scripts go) because it really does not do much. The python script can be downloaded here. We also present it here:

Line 1: The Unix shebang
This is the standard Unix shebang line which defines which program should be used to interpret the script. This "shebang" MUST be the first line of the script --- it is not recognized if there are any line, even comment lines and/or blank lines before it. The Slurm scheduler requires that your job script starts with a shebang line.

This is a python script, so our shebang runs python. In particular, the sequence #!/bin/env python is a common Unix idiom for shebangs to instruct the system to use the python command in your PATH.

Lines 3-38: Define the hello_mpi function
This defines the main hello_mpi function for this script. This function actually does all of the real work for this script, and we examine it in detail below.
Lines 5-15: Comments
These lines are comments. They are ignored by the python interpreter, but can provide useful information to people who are reading the script. It is very good practice to put comments in your code so other people (and perhaps even you looking back on the code long after you first wrote it) can quickly figure out what it is doing.

This particular block of comments just provides some basic identification of the script, what is does, where it came from, etc. Other comments throughout the script give a brief description of what is being done.

Lines 18-20: Imports
These lines import some Python modules that we will need. The sys module is generally useful, the platform module is used to get the name of the compute node the task is running on, and the mpi4py module is what interfaces with the MPI libraries .
Line 23: Get our nodename
This line obtains the name of the node we are running on and saves it to the Python variable named node. Each MPI task will start it's own copy of the python interpreter running this script. In general, an MPI job can run on multiple nodes, the value saved in the variable node can be different for different tasks.
Line 26: Get MPI Communicator
This line retrieves the default MPI Communicator, the MPI_COMM_WORLD communicator for use by our code. Communicator objects are what MPI uses to connect the various MPI tasks processes associated with an MPI session, and the MPI_COMM_WORLD communicator includes all of the tasks that are part of the job.
Lines 28-29
These lines use our previously obtained MPI communicator and uses the Get_rank and Get_size methods to obtain the rank and size. The size is the number of processes in the communicator, and so should be the same for all tasks. The rank is an unique integer identifier, ranging from 0 to size - 1, identifying the task. So this will be distinct for all of the tasks in the job.
Lines 31-33: Identifiying ourselves
These lines will print out a should identification line including the version of our script. To (slightly) reduce the amount of output, we include this in an if statement so that it is only printed on the first task (rnak 0).
Line 36: Print our rank
This line prints some information about the current task, including its rank and the node it is running on. Unlike the previous line, this runs for all tasks.
Lines 41-42: Invoke our function
These lines invoke our hello_mpi function. This is a standard Python idiom for allowing the same code to be used either as a library or a script; it invokes the hello_mpi function when the file is executed as a Python script but not if imported.

Running the examples

The easiest way to run this example is with the Job Composer of the OnDemand portal, using the Basic_Python_MPI template.

To submit from the command line, just

  1. Download the submit script and the python script to the login node on the cluster.
  2. Run the command sbatch python_mpi.sh. This will submit the job to the scheduler, and should return a message like Submitted batch job 23767 --- the number will vary (and is the job number for this job). The job number can be used to reference the job in Slurm, etc. (Please always give the job number(s) when requesting help about a job you submitted).

Whichever method you used for submission, the job will be queued for the debug partition and should run within 15 minutes or so. When it finishes running, the slurm-JOBNUMBER.out should contain the output from our diagnostic commands (time the job started, finished, module list, etc). The output of the hello-umd will be in the file hello.out in the job specific work directory created in your lustre directory. For the convenience of users of the OnDemand portal, a symlink to this directory is created in the submission directory. So if you used OnDemand, a symlink to the work directory will appear in the Folder contents section on the right.

The slurm-JOBNUMBER.out file will resemble (from an Intel MPI example):

Hello, world! from MPI task  3  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  4  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  1  of  60  on node  compute-10-0.juggernaut.umd.edu
HELLO_MPI: Version 1.5
Hello, world! from MPI task  2  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  0  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  5  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  8  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  6  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  9  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  10  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  11  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  7  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  12  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  13  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  14  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  15  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  16  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  17  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  18  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  19  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  20  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  21  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  22  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  23  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  24  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  25  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  26  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  27  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  28  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  29  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  30  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  31  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  32  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  33  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  34  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  35  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  36  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  37  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  38  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  39  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  40  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  41  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  42  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  43  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  44  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  45  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  46  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  47  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  48  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  49  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  50  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  51  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  52  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  53  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  54  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  55  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  56  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  57  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  58  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  59  of  60  on node  compute-10-1.juggernaut.umd.edu

Basically, you should see a message from each task 0 to 59, in some random order. The identifying comments (with the version number) will appear somewhere in the mix. Because everything is running in parallel, the order will not be constant. Note that the tasks will be divided across multiple nodes (in this case compute-10-0 and compute-10-1). On Juggernaut, the 60 cores will require two nodes, and on Deepthought2 it would require three nodes.