Python MPI Job Submission Example

This page provides an example of submitting a simple MPI job using Python, and in particular the mpi4py Python package.

This job basically runs a simple MPI enabled Hello World! type script which basically just prints a single line from each task identifying its rank and the node it is running on.

We provide line-by-line descriptions of both the submission script as well of the python script.

The submission script

The submission script is similar to other MPI jobs , and can be downloaded here. We also present it here:


#!/bin/bash
# The line above this is the "shebang" line.  It must be first line in script
#-----------------------------------------------------
#	Open OnDemand Job Template
#	For a basic python job using MPI
#-----------------------------------------------------
#
# Slurm sbatch parameters section:
#	Request 60 tasks with one CPU core each
#SBATCH -n  60
#SBATCH -c 1
#	Request 5 minutes of walltime
#SBATCH -t 5
#	Request 1 GB of memory per CPU core
#SBATCH --mem-per-cpu=1024
#	Do not allow other jobs to run on same node
#SBATCH --exclusive
#	Run on debug partition for rapid turnaround.  You will need
#	to change this (remove the line) if walltime > 15 minutes
#SBATCH --partition=debug
#       Do not inherit the environment of the process running the
#       sbatch command.  This requires you to explicitly set up the
#       environment for the job in this script, improving reproducibility
#SBATCH --export=NONE
#

# This job will run the code in hello_mpi.py script from the submission dir
# We create a working directory on parallel file system and run the job
# from there, and then make a symlink to the working dir in the submission dir

# Section to ensure we have the "module" command defined
unalias tap >& /dev/null
if [ -f ~/.bash_profile ]; then
	source ~/.bash_profile
elif [ -f ~/.profile ]; then
	source ~/.profile
fi

# Set SLURM_EXPORT_ENV to ALL.  This prevents the --export=NONE flag
# from being passed to mpirun/srun/etc, which can cause issues.
# We want the environment of the job script to be passed to all 
# tasks/processes of the job
export SLURM_EXPORT_ENV=ALL

# Module load section
# First clear our module list 
module purge
# and reload the standard modules
module load hpcc/deepthought2
# Load the desired compiler, MPI, and python modules
# NOTE: You need to use the same compiler and MPI module used
# when compiling the python (and its mpi4py package).  The values
# listed below are correct, you may need to change them if you change
# the python version.
module load gcc/8.4.0
module load openmpi/3.1.5
module load python/3.7.7

# Section to make a scratch directory for this job
# Because different MPI tasks, which might be on different nodes, need
# access to it, we put it in lustre.  We include the SLURM jobid in the
# directory name to avoid interference if multiple jobs running at same time.
TMPWORKDIR="/lustre/$USER/ood-mpi4py.${SLURM_JOBID}"
mkdir $TMPWORKDIR
cd $TMPWORKDIR

# Section to output information identifying the job, etc.
echo "Slurm job ${SLURM_JOBID} running on"
hostname
echo "To run on ${SLURM_NTASKS} CPU cores across ${SLURM_JOB_NUM_NODES} nodes"
echo "All nodes: ${SLURM_JOB_NODELIST}"
date
pwd
echo "Loaded modules are:"
module list


# Setting this variable will suppress the warnings
# about lack of CUDA support on non-GPU enabled nodes.  We
# are not using CUDA, so warning is harmless.
export OMPI_MCA_mpi_cuda_support=0

# Get the full path to our python executable.  It is best
# to provide full path of our executable to , etc.
MYPYTHON=`which python`
echo "Using python $MYPYTHON"

# Run our script using mpirun
# We do not specify the number of tasks here, and instead rely on
# it defaulting to the number of tasks requested of Slurm
mpirun  ${MYPYTHON} ${SLURM_SUBMIT_DIR}/hello_mpi.py > hello.out 2>&1
# Save the exit code from the previous command
ECODE=$?

# Symlink our working dir back into submit dir
ln -s ${TMPWORKDIR} ${SLURM_SUBMIT_DIR}/work-dir

echo "Job finished with exit code $ECODE.  Working dir is $TMPWORKDIR"
date

# Exit with the cached exit code
exit $ECODE

Basic_Python_MPI_Job job submission script
Line#	Code
`1`	`#!/bin/bash`
`2`	`# The line above this is the "shebang" line. It must be first line in script`
`3`	`#-----------------------------------------------------`
`4`	`# Open OnDemand Job Template`
`5`	`# For a basic python job using MPI`
`6`	`#-----------------------------------------------------`
`7`	`#`
`8`	`# Slurm sbatch parameters section:`
`9`	`# Request 60 tasks with one CPU core each`
`10`	`#SBATCH -n 60`
`11`	`#SBATCH -c 1`
`12`	`# Request 5 minutes of walltime`
`13`	`#SBATCH -t 5`
`14`	`# Request 1 GB of memory per CPU core`
`15`	`#SBATCH --mem-per-cpu=1024`
`16`	`# Do not allow other jobs to run on same node`
`17`	`#SBATCH --exclusive`
`18`	`# Run on debug partition for rapid turnaround. You will need`
`19`	`# to change this (remove the line) if walltime > 15 minutes`
`20`	`#SBATCH --partition=debug`
`21`	`# Do not inherit the environment of the process running the`
`22`	`# sbatch command. This requires you to explicitly set up the`
`23`	`# environment for the job in this script, improving reproducibility`
`24`	`#SBATCH --export=NONE`
`25`	`#`
`26`
`27`	`# This job will run the code in hello_mpi.py script from the submission dir`
`28`	`# We create a working directory on parallel file system and run the job`
`29`	`# from there, and then make a symlink to the working dir in the submission dir`
`30`
`31`	`# Section to ensure we have the "module" command defined`
`32`	`unalias tap >& /dev/null`
`33`	`if [ -f ~/.bash_profile ]; then`
`34`	`source ~/.bash_profile`
`35`	`elif [ -f ~/.profile ]; then`
`36`	`source ~/.profile`
`37`	`fi`
`38`
`39`	`# Set SLURM_EXPORT_ENV to ALL. This prevents the --export=NONE flag`
`40`	`# from being passed to mpirun/srun/etc, which can cause issues.`
`41`	`# We want the environment of the job script to be passed to all`
`42`	`# tasks/processes of the job`
`43`	`export SLURM_EXPORT_ENV=ALL`
`44`
`45`	`# Module load section`
`46`	`# First clear our module list`
`47`	`module purge`
`48`	`# and reload the standard modules`
`49`	`module load hpcc/deepthought2`
`50`	`# Load the desired compiler, MPI, and python modules`
`51`	`# NOTE: You need to use the same compiler and MPI module used`
`52`	`# when compiling the python (and its mpi4py package). The values`
`53`	`# listed below are correct, you may need to change them if you change`
`54`	`# the python version.`
`55`	`module load gcc/8.4.0`
`56`	`module load openmpi/3.1.5`
`57`	`module load python/3.7.7`
`58`
`59`	`# Section to make a scratch directory for this job`
`60`	`# Because different MPI tasks, which might be on different nodes, need`
`61`	`# access to it, we put it in lustre. We include the SLURM jobid in the`
`62`	`# directory name to avoid interference if multiple jobs running at same time.`
`63`	`TMPWORKDIR="/lustre/$USER/ood-mpi4py.${SLURM_JOBID}"`
`64`	`mkdir $TMPWORKDIR`
`65`	`cd $TMPWORKDIR`
`66`
`67`	`# Section to output information identifying the job, etc.`
`68`	`echo "Slurm job ${SLURM_JOBID} running on"`
`69`	`hostname`
`70`	`echo "To run on ${SLURM_NTASKS} CPU cores across ${SLURM_JOB_NUM_NODES} nodes"`
`71`	`echo "All nodes: ${SLURM_JOB_NODELIST}"`
`72`	`date`
`73`	`pwd`
`74`	`echo "Loaded modules are:"`
`75`	`module list`
`76`
`77`
`78`	`# Setting this variable will suppress the warnings`
`79`	`# about lack of CUDA support on non-GPU enabled nodes. We`
`80`	`# are not using CUDA, so warning is harmless.`
`81`	`export OMPI_MCA_mpi_cuda_support=0`
`82`
`83`	`# Get the full path to our python executable. It is best`
`84`	`# to provide full path of our executable to , etc.`
`85`	MYPYTHON=`which python`
`86`	`echo "Using python $MYPYTHON"`
`87`
`88`	`# Run our script using mpirun`
`89`	`# We do not specify the number of tasks here, and instead rely on`
`90`	`# it defaulting to the number of tasks requested of Slurm`
`91`	`mpirun ${MYPYTHON} ${SLURM_SUBMIT_DIR}/hello_mpi.py > hello.out 2>&1`
`92`	`# Save the exit code from the previous command`
`93`	`ECODE=$?`
`94`
`95`	`# Symlink our working dir back into submit dir`
`96`	`ln -s ${TMPWORKDIR} ${SLURM_SUBMIT_DIR}/work-dir`
`97`
`98`	`echo "Job finished with exit code $ECODE. Working dir is $TMPWORKDIR"`
`99`	`date`
`100`
`101`	`# Exit with the cached exit code`
`102`	`exit $ECODE`

The python script

The python script is somewhat simple (as far as MPI scripts go) because it really does not do much. The python script can be downloaded here. We also present it here:


#!/bin/env python
#
def hello_mpi ( ):

#*****************************************************************************
#
# HELLO_MPI is a simple 'Hello, world!' test of MPI4PY.
#  Licensing:
#    This code is distributed under the GNU LGPL license.
#  Author:
#    John Burkardt
#    with some modifications by Tom Payerle
#  Retrieved from:
#     https://people.sc.fsu.edu/~jburkardt/py_src/hello_mpi/hello_mpi.py
#

  # Import some useful packages
  import sys
  import platform
  from mpi4py import MPI

  # Get the name of the compute node we are running on
  node = platform.node()

  # Use the default MPI communicator
  comm = MPI.COMM_WORLD
  # Get the size (total number of tasks) and rank (number of our task)
  id = comm.Get_rank()
  p = comm.Get_size()

  if ( id == 0 ):
    # Print our version number, but only if task # 0
    print ( 'HELLO_MPI: Version 1.5' )

  #Print a message from each task
  print ( 'Hello, world! from MPI task ', id, ' of ', p, ' on node ', node)

  return

# Run the hello_mpi routine when called as a script
if ( __name__ == '__main__' ):
  hello_mpi ( )

Line#	Code
`1`	`#!/bin/env python`
`2`	`#`
`3`	`def hello_mpi ( ):`
`4`
`5`	`#*****************************************************************************`
`6`	`#`
`7`	`# HELLO_MPI is a simple 'Hello, world!' test of MPI4PY.`
`8`	`# Licensing:`
`9`	`# This code is distributed under the GNU LGPL license.`
`10`	`# Author:`
`11`	`# John Burkardt`
`12`	`# with some modifications by Tom Payerle`
`13`	`# Retrieved from:`
`14`	`# https://people.sc.fsu.edu/~jburkardt/py_src/hello_mpi/hello_mpi.py`
`15`	`#`
`16`
`17`	`# Import some useful packages`
`18`	`import sys`
`19`	`import platform`
`20`	`from mpi4py import MPI`
`21`
`22`	`# Get the name of the compute node we are running on`
`23`	`node = platform.node()`
`24`
`25`	`# Use the default MPI communicator`
`26`	`comm = MPI.COMM_WORLD`
`27`	`# Get the size (total number of tasks) and rank (number of our task)`
`28`	`id = comm.Get_rank()`
`29`	`p = comm.Get_size()`
`30`
`31`	`if ( id == 0 ):`
`32`	`# Print our version number, but only if task # 0`
`33`	`print ( 'HELLO_MPI: Version 1.5' )`
`34`
`35`	`#Print a message from each task`
`36`	`print ( 'Hello, world! from MPI task ', id, ' of ', p, ' on node ', node)`
`37`
`38`	`return`
`39`
`40`	`# Run the hello_mpi routine when called as a script`
`41`	`if ( __name__ == '__main__' ):`
`42`	`hello_mpi ( )`

Line 1: The Unix shebang: This is the standard Unix shebang line which defines which program should be used to interpret the script. This "shebang" MUST be the first line of the script --- it is not recognized if there are any line, even comment lines and/or blank lines before it. The Slurm scheduler requires that your job script starts with a shebang line.
This is a python script, so our shebang runs python. In particular, the sequence #!/bin/env python is a common Unix idiom for shebangs to instruct the system to use the python command in your PATH.
Lines 3-38: Define the hello_mpi function: This defines the main hello_mpi function for this script. This function actually does all of the real work for this script, and we examine it in detail below.
Lines 5-15: Comments: These lines are comments. They are ignored by the python interpreter, but can provide useful information to people who are reading the script. It is very good practice to put comments in your code so other people (and perhaps even you looking back on the code long after you first wrote it) can quickly figure out what it is doing.
This particular block of comments just provides some basic identification of the script, what is does, where it came from, etc. Other comments throughout the script give a brief description of what is being done.
Lines 18-20: Imports: These lines import some Python modules that we will need. The sys module is generally useful, the platform module is used to get the name of the compute node the task is running on, and the mpi4py module is what interfaces with the MPI libraries .
Line 23: Get our nodename: This line obtains the name of the node we are running on and saves it to the Python variable named node. Each MPI task will start it's own copy of the python interpreter running this script. In general, an MPI job can run on multiple nodes, the value saved in the variable node can be different for different tasks.
Line 26: Get MPI Communicator: This line retrieves the default MPI Communicator, the MPI_COMM_WORLD communicator for use by our code. Communicator objects are what MPI uses to connect the various MPI tasks processes associated with an MPI session, and the MPI_COMM_WORLD communicator includes all of the tasks that are part of the job.
Lines 28-29: These lines use our previously obtained MPI communicator and uses the Get_rank and Get_size methods to obtain the rank and size. The size is the number of processes in the communicator, and so should be the same for all tasks. The rank is an unique integer identifier, ranging from 0 to size - 1, identifying the task. So this will be distinct for all of the tasks in the job.
Lines 31-33: Identifiying ourselves: These lines will print out a should identification line including the version of our script. To (slightly) reduce the amount of output, we include this in an if statement so that it is only printed on the first task (rnak 0).
Line 36: Print our rank: This line prints some information about the current task, including its rank and the node it is running on. Unlike the previous line, this runs for all tasks.
Lines 41-42: Invoke our function: These lines invoke our hello_mpi function. This is a standard Python idiom for allowing the same code to be used either as a library or a script; it invokes the hello_mpi function when the file is executed as a Python script but not if imported.

Running the examples

The easiest way to run this example is with the Job Composer of the OnDemand portal, using the Basic_Python_MPI template.

To submit from the command line, just

Download the submit script and the python script to the login node on the cluster.
Run the command sbatch python_mpi.sh. This will submit the job to the scheduler, and should return a message like Submitted batch job 23767 --- the number will vary (and is the job number for this job). The job number can be used to reference the job in Slurm, etc. (Please always give the job number(s) when requesting help about a job you submitted).

Whichever method you used for submission, the job will be queued for the debug partition and should run within 15 minutes or so. When it finishes running, the slurm-JOBNUMBER.out should contain the output from our diagnostic commands (time the job started, finished, module list, etc). The output of the hello-umd will be in the file hello.out in the job specific work directory created in your lustre directory. For the convenience of users of the OnDemand portal, a symlink to this directory is created in the submission directory. So if you used OnDemand, a symlink to the work directory will appear in the Folder contents section on the right.

The slurm-JOBNUMBER.out file will resemble (from an Intel MPI example):

Hello, world! from MPI task  3  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  4  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  1  of  60  on node  compute-10-0.juggernaut.umd.edu
HELLO_MPI: Version 1.5
Hello, world! from MPI task  2  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  0  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  5  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  8  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  6  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  9  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  10  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  11  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  7  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  12  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  13  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  14  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  15  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  16  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  17  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  18  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  19  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  20  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  21  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  22  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  23  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  24  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  25  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  26  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  27  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  28  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  29  of  60  on node  compute-10-0.juggernaut.umd.edu
Hello, world! from MPI task  30  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  31  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  32  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  33  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  34  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  35  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  36  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  37  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  38  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  39  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  40  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  41  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  42  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  43  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  44  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  45  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  46  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  47  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  48  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  49  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  50  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  51  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  52  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  53  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  54  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  55  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  56  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  57  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  58  of  60  on node  compute-10-1.juggernaut.umd.edu
Hello, world! from MPI task  59  of  60  on node  compute-10-1.juggernaut.umd.edu

Basically, you should see a message from each task 0 to 59, in some random order. The identifying comments (with the version number) will appear somewhere in the mix. Because everything is running in parallel, the order will not be constant. Note that the tasks will be divided across multiple nodes (in this case compute-10-0 and compute-10-1). On Juggernaut, the 60 cores will require two nodes, and on Deepthought2 it would require three nodes.