python: Python scripting language

WARNING
*** DEPRECATED ***

NOTE: The authors of Python have stopped maintenance of all Python2 versions as of 1 Jan 2020. While the UMD Division of Information Technology continues to provide access to the existing Python2 installs for now, all Python2 installations are ***DEPRECATED*** and will not be upgraded, have new extensions installed, etc. It is likely that Python2 will not be made available when the Zaratan cluster is stood up.

All Python users are strongly encouraged to migrate to Python3.

Contents

  1. Overview of package
  2. Overview of package
    1. General usage
  3. Availability of package by cluster
  4. Installing modules
    1. Using setup.py
    2. Using virtual environments and pip
  5. Assorted Tips and Tricks
    1. Matplotlib Tricks
  6. Numba and GPU Support
  7. Using python with MPI

Overview of package

General information about package
Package: python
Description: Python scripting language
For more information: https://www.python.org
Categories:
License: OpenSource (Python Software Foundation)

General usage information

Python is a high-level scripting language.

This module will add the python and related commands to your path.

In case you need to link against this library in your code, the following environmental variables have been defined:

  • \$PYTHON_ROOT has been set to the root of the python installation
  • \$PYTHON_LIBDIR points to the directory containing the libraries
  • \$PYTHON_INCDIR points to the directory containing the header files

You will probably wish to use these by adding the following flags to your compilation command (e.g. to CFLAGS in your Makefile):

  • -I\$PYTHON_INCDIR
and the following flags to your link command (e.g. LDFLAGS in your Makefile):
  • -L\$PYTHON_LIBDIR -Wl,-rpath,\$PYTHON_LIBDIR

Available versions of the package python, by cluster

This section lists the available versions of the package pythonon the different clusters.

Available versions of python on the Deepthought2 cluster (RHEL8)

Available versions of python on the Deepthought2 cluster (RHEL8)
Version Module tags CPU(s) optimized for GPU ready?
2.7.16 python/2.7.16 ivybridge, x86_64 Y
3.8.2 python/3.8.2 ivybridge, x86_64 Y
3.7.7 python/3.7.7 ivybridge, x86_64 Y

Available versions of python on the Juggernaut cluster

Available versions of python on the Juggernaut cluster
Version Module tags CPU(s) optimized for GPU ready?
3.7.7 python/3.7.7 skylake_avx512, x86_64, zen Y

Available versions of python on the Deepthought2 cluster (RHEL6) [DEPRECATED]

Available versions of python on the Deepthought2 cluster (RHEL6) [DEPRECATED]
Version Module tags CPU(s) optimized for GPU ready?
3.7.3 python/3.7.3 ivybridge N
3.5.1 python/3.5.1 ivybridge N
3.2.3 python/3.2.3 ivybridge N
2.7.8 python/2.7.8 ivybridge N
WARNING
*** DEPRECATED ***

NOTE: The authors of Python have stopped maintenance of all Python2 versions as of 1 Jan 2020. While the UMD Division of Information Technology continues to provide access to the existing Python2 installs for now, all Python2 installations are ***DEPRECATED*** and will not be upgraded, have new extensions installed, etc. It is likely that Python2 will not be made available when the Zaratan cluster is stood up.

All Python users are strongly encouraged to migrate to Python3.

When using in conjunction with your own code, you might wish to note the compiler and MPI libraries used when the python binaries and packages were built. MPI in particular can be fussy and generate strange errors if the different parts of the code are linked against different MPI libraries (even different versions of OpenMPI or the same version of OpenMPI built with a different compiler), or if the mpirun command used to start the code is from a different MPI version or was built with a different compiler. In general, it is best to ensure everything is built with the same compiler and, if used, the same MPI library.

Installing modules

Python's capabilities can be significantly enhanced through the addition of modules. Code can import a module to enable its functionality.

The supported python interpretters on the system have a selection of modules preinstalled. If a module you are interested in is not in that list, you can either install a personal copy of the module for yourself, or request that it be installed site wide. We will make reasonable efforts to accomodate such requests as staffing resources allow.

Installing modules yourself

  • Using setup.py
  • Using virtual environments and pip
  • The mechanism for installing a module is of course dependent on the module being installed, but most modern python modules support the setup.py mechanism described below. But many packages will support installing via pip and virtual environments as well, and that is typically easier.

    Installing python modules using setup.py

    Note: Users might wish to look at Installing python modules using virtual environments first, as that is often easier.

    The standard procedure for installing your own copy of a module is:

    1. module load python/X.Y.Z to select the version of python you wish to use.
    2. Create a directory to contain your python module, if not already done. Typically, you will want one directory to house all of the modules you are installing, so something like mkdir ~/.mypython will work. You should also create lib and lib/python directories beneath it, e.g. mkdir ~/.mypython/lib ~/.mypython/lib/python.
    3. You will need to tell python where to look for your modules. Assuming you are putting your modules under ~/.mypython, something like setenv PYTHONPATH ~/.mypython/lib/python (bash/bourne shell users should do PYTHONPATH=~/.mypython/lib/python; export PYTHONPATH ). You probably want to add this to your .cshrc.mine or .bashrc.mine.
    4. Download and unzip/untar/etc the module sources. Cd to the main module source directory (it should contain the file setup.py
    5. Run python setup.py install --home ~/.mypython

    If all goes well, the module should now be installed under ~/.mypython or wherever you specified. If there are executables associated with it, they should be in ~/.mypython/bin. You should be able to import the module in python now (this assumes that PYTHONPATH is set as indicated above).

    Of course, not all modules install easily. Unfortunately, the install process can fail in far too many ways than can be reasonably enumerated. If you are comfortable with building modules, you might find reasonable guidance from error messages to assist you in getting the module to build, but it is probably easiest to just request the module be installed to the system libraries.

    Installing modules yourself using virtual environments

    Although the standard procedure described above works for most cases, there are cases where more separation is required. Python3 includes a venv module which allows you to create a fully independent virtual python environment, copying the python executables and standard and system libraries to your own directory, and allowing you to add/update/delete from there. This has the advantage that the virtualenv is almost completely isolated; so changes made in the system installation of python are unlikely to impact your virtualenv. This can be important if you have a code or application which requires e.g. version 1.6 of the foo package, but will break if it is upgraded to 1.7 (it appears that when using standard scheme above using PYTHONPATH, the system library directories are ALWAYS searched before PYTHONPATH, meaning that method can be used to add modules, but not to upgrade or downgrade modules).

    However, the virtualenv takes up a significant amount of diskspace, and the isolation from the system python can be a negative as well as upgrades and/or new modules added to the system python will NOT be visible --- this is good when as in the example above it breaks something, but most of the time the upgrades are desirable.

    To install a package with the virtualenv mechanism, you must first create a virtual python environment.

    1. module load python/X.Y.Z to select the version of python you wish to use in this virtual environment.
    2. You should select a directory where the virtual python environment should live. Each virtual environment you create will be a subdirectory of this directory. For the examples below, we will use the my-venv subdirectory of your home directory (e.g. ~/my-venv).
    3. For python3, you can create a virtual env with either of the simple commands:
      1. python -mvenv --system-site-packages ~/my-venv: This variant will give the virtual environment access to system installed python packages, e.g. numpy, scipy and matplotlib. This is the easiest version, but as it is less isolated from the system python installation it can lead to problems if there are version compatibility issues.
      2. python -mvenv ~/my-venv: This variant will isolate the resulting environment from system packages. This is the safest approach, but may require you to install packages available on the system, and can be trickier in some cases.
    4. For python2, you need to create the empty directory (mkdir ~/my-venv), and then unset the PYTHONHOME variable set by the module load command (i.e. unsetenv PYTHONHOME for the csh and tcsh shells, and unset PYTHONHOME for bash). Then issue either of the following commands:
      1. virtualenv --system-site-packages ~/my-venv
      2. virtualenv ~/my-venv
      The first version, with the --system-site-packages flag, behaves like the python3 version with the same flag --- the system packages are still available. The second version isolates your virutal environment from the system packages.
    5. In order to use this virtual python (for either python version) environment, you must first activate it. This must be done in every process using the virtual environment. You do this by issue one of the following commands:
      1. source ~/venv/bin/activate.csh for csh or tcsh shells
      2. source ~/venv/bin/activate for bash shells
    6. When you are finished with the environment, the command deactivate will deactivate the virtual environment.

    Once the virtual environment is created and activated, installation is usually relatively simple using the pip command. You should just be able to do pip install NameOfPackage. Pip should take care of downloading the package and installing it for you.

    Of course, not all modules install easily. Unfortunately, the install process can fail in far too many ways than can be reasonably enumerated. If you are comfortable with building modules, you might find reasonable guidance from error messages to assist you in getting the module to build, but it is probably easiest to just request the module be installed to the system libraries.

    me="tricks">

    Assorted Tips and Tricks

    Matplotlib Tricks

    • Using matplotlib in batch jobs/without an X server: By default, the matplotlib package in Python expects to work with a graphical user interface (GUI), which on Unix-like systems means an X server running. This can be problematic if one wishes to use matplotlib in batch jobs (e.g. on an HPC cluster) because typically a display will not be available. The easiest way to do this is to specify a non-interactive backend. There are several ways to do this, but since you probably want to continue using an interactive backend when using python interactively, the best approach is to have your batch code select a non-interactive backend. A common choice for such is Agg (for Anti-Grain Geometry engine) which can produce PNG files, Cairo and Gdk are other options. Use would be something like:
              import matplotlib
              # This needs to be done *before* importing pyplot or pylab
              matplotlib.use('Agg')
              import matplotlib.pyplot as plt
      
              #Do your plotting, e.g.
              fig = plt.figure()
              ax = fig.add_subplot(111)
              ax.plot(range(10))
              fig.savefig('test.png')
      For more information, see: Matplotlib Documentation on running without a GUI

    Numba and GPU Support

    The most recent versions of Python installed (e.g. 3.5.1) provide a python module called "numba". Numba allows for certain portions of python code to be compiled to a lower-level machine code to improve performance, in many cases simply by adding the directive "@jit" before the function to compile. Depending on the function, one might achieve order of magnitude sized performance gains. E.g. (example taken from wikipedia)

    from numba import jit
    @jit
    def sum1d(my_array):
            total = 0.0
            for i in range(my_array.shape[0])
                    total += my_array[i]
            return total

    Here, the addition of the "@jit" (for just-in-time compilation) can result in code running 100-200 times faster than the original on a long Numpy array, and up to 30% faster than Numpy's builtin "sum()" function, on standard CPU cores.

    Some codes can perform even better on GPUs, and Numba can make this fairly simple by importing "cuda" from numba and using "cuda.jit" in place of "jit". There are constraints imposed when using GPUs, so not every code can be easily converted for GPU use.

    To use Numba with GPUs on the Deepthought clusters, you will need to

    1. Request a GPU-enabled node
    2. Load an appropriate version of CUDA. Currently, cuda/7.0.28 or cuda/7.5.18 will work with Numba.

    The details of using Numba, and especially using Numba with CUDA, is well beyond the scope of this document. Some useful links for more information are:

    Using python with MPI

    If you wish to take advantage of the multiple cores and even many nodes available on High Performance Computing (HPC) clusters, it is useful to use the Message Passing Interface (MPI) for coordinate and communicate among the various processes, a standard and ubiquitous programming methodology for distributed memory parallelism.

    There is a package mpi4py available on all Pythons installed system-wide on the Deepthought clusters which basically makes the various MPI calls available to python code. Because mpi4py basically mimics the function calls in the standard MPI library/API, it makes the task of transcribing algorithms from python to/from C much easier.

    When you have python code (e.g. my-mpi4py-script.py) designed to use MPI via mpi4py, you will normally wish to execute the python code using the mpirun command. It is important that you use the mpirun command from the SAME MPI library as was used to build mpi4py for the python version you are running --- typically this will mean using module load to load the correct gcc compiler and openmpi version as used in building the python interpretter and modules, as listed in the version information table at the top of this document. E.g., a job submission script to launch my-mpi4py-script.py on 40 cores using python/3.5.1 might look like:

    #!/bin/bash
    #Assume will be finished in no more than 8 hours
    #SBATCH -t 8:00:00
    #Launch on 40 cores distributed over as many nodes as needed
    #SBATCH -n 40
    #Assume need 6 GB/core (6144 MB/core)
    #SBATCH --mem-per-core=6144
    
    #Make sure module cmd gets defined
    . ~/.profile
    
    #Load required modules
    module load python/3.5.1
    #Load correct gcc (4.9.3) and mpi (openmpi/1.8.6) for python/3.5.1
    module load gcc/4.9.3
    module load openmpi/1.8.6
    
    #Normally do not need to give -n 40, as openmpi will determine from Slurm
    #environment variables
    mpirun mp-mpi4py-script.py
    
    

    Although exploring mpi4py is beyond the scope of this document, we do provide some on-line tutorials, etc., to help if you wish to explore mpi4py further: