This is a "quick start" introduction into using the HPC clusters at the University of Maryland with the Linux/Unix command line. This covers the general activities most users will deal with when using the clusters.
NOTE: This document covers the command line interface. Users without experience with the Unix/Linux command line might prefer to look at the quickstart OnDemand web portal which they will likely find easier to use. Also, users of Matlab Parallel Server (formerly known as Matlab Distributed Computing Server (MDCS)) can access some aspects of the cluster from Matlab running on their workstation, avoiding some of the complexities of the Linux command line.
This quick start assumes that you already
All of the clusters have at least 2 nodes available for users to log into. From these nodes you can submit and monitor your jobs, look at results of the jobs, etc.
DO NOT RUN computationally intensive processes on the login nodes!!!. These are in violation of policy, interfere with other users of the clusters, and will be killed without warning. Repeated offenses can lead to suspension of your privilege to use the clusters.
For most tasks you will wish to accomplish, you will start by logging into one of the login nodes for the appropriate cluster. These are:
ssh -l johndoe login.zaratan.umd.edu
See the section on logging into the clusters for more information.
Next, you'll need to create a job script. This is just a simple shell script that will specify the necessary job parameters and then run your program.
Here's an example of a simple script, we'll call
#!/bin/bash #SBATCH -t 1 #SBATCH -n 4 #SBATCH --mem-per-cpu=128 #SBATCH --oversubscribe . ~/.bashrc module load python hostname date
The first line, the shebang, specifies the shell to be used to run the script. Note that you must have a shebang specifying a valid shell in order for Slurm to accept and run your job; this differs from Moab/PBS/Torque which ignores the shebang and runs the job in your default shell unless you gave an option to qsub for a different shell.
The next three lines specify parameters to the scheduler.
-t, specifies the maximum amount of time
you expect your job to run. It can take various forms, but usually
you will want to give minutes, hours:minutes:seconds, or days-hours.
You should always set a reasonable wall time limit;
this will help improve utilization of the cluster and reduce the amount
of time your job will wait in the queue. To encourage this, the
default wall time limit is rather short. In this example, we specified
a wall time limit of 1 minute; normally this would be much longer, but
this is a trivial job.
See the section on specifying the walltime limit for more information.
The second line,
-n, tells the scheduler on how many
tasks/cores your job will have (by default Slurm assigns a distinct
core to each task). We do not specify how Slurm should distribute
these cores across machines, so Slurm can distribute them however
it sees fit. That is usually sufficient for many MPI jobs, and
there are other options that allow for very detailed specification
on how the cores should be distributed, as
briefly described here and
in the examples page.
In this example, we are requesting 4 cores (which is way more than needed for this trivial example). Most likely we will get all 4 cores on a single node, but that is NOT guaranteed. We could possibly get one core on each of 4 nodes, or some allocation of 4 cores on 2 or 3 nodes.
See the section on specifying the node/core requirements for more information.
The third line,
--mem-per-core, tells the scheduler on how
much memory to allocate for your jobs. This particular form,
reserves the requested amount of memory (N MB) per CPU core assigned.
A similar form,
--mem=N, reserves the requested
amount of memory (N MB) for the entire job. The
is usually more convenient. Nodes on the Zaratan cluster
should have at least 4 GB/core.
In this example, we are requesting 128 MB per core, for a total of 512 MB for
our 4 core job. If we used
--mem=128 we would get a total of 128 MB
(or effectively 32 MB per core), which for this trivial job is still way more than
is actually needed.
See the section on specifying the memory requirements for more information.
The fourth line,
--oversubscribe, is the default for the Zaratan
HPC cluster, and states that we are willing
to share a node with other jobs. E.g., on Zaratan, all nodes
have 128 cores; by using
--oversubscribe mode, if all of our
cores are assigned to one such node, Slurm will reserve 4 cores for us, but
can assign the other 124 cores to other jobs while our job is running. The
--exclusive, which prevents other jobs from running
on the same node(s) as the exclusive job. If our sample job was
--exclusive and assigned to a 128 core node, the other 124 cores
would be unassigned and idle while the job ran.
NOTE: exclusive jobs get charged for both the
cores they use AND for the cores they prevent from being used by anyone else
due to the exclusive status. E.g., if the example job was
and assigned a 128 core node, it would accrue charges for 128 cores for as long
as it ran.
See the section on specifying whether other jobs can be on the same node for more information.
Users of the Zaratan HPC cluster do not need to specify a partition when using the standard partition. However a partition will need to be specified when using GPUs or large memory nodes, or the debug or scavenger partitions.
It is advisable to include at least these four above options
(wall time limit, number of cores, memory and either exclusivity or partition
depending on the cluster) for all jobs,
either in the job script as shown, or on the sbatch command line
(see for general information
on providing options to the
sbatch command). There are many other
possible arguments to the
sbatch command, the more commonly
used ones are described here.
The remaining lines in the file are just standard commands, you will replace them with whatever your job requires. In this case once the job runs, it will print out the time and hostname to the output file. The script will be run in whatever shell is specified by the shebang on the first line of the script. NOTE: unlike with the Moab scheduler, you MUST provide a valid shebang on the first line.
Note that when your job starts, your job script is executed on the
first node assigned to your job. The list of nodes assigned to your
job, etc. are available in Slurm
environmental variables, but Slurm does not do anything to parallelize
your job. Your script is responsible for farming out tasks to the different
cores/nodes that are part of the job. Normally, a parallel application
will handle that, or you issue your MPI-aware code with
which handles that.
See the section on running MPI jobs for more information.
In particular, note the the example given is BAD. Although it requests 4 cores, all the commands listed (hostname, date) are single core commands, so 3 of the requested cores will actual be idle while the job is running. Since this job is just a simple example and will finish in seconds, that is not a big issue in this case. But in general, simply submitting serial code as a sbatch job requesting more than one core DOES NOT parallelize a job.
For users of the Zaratan HPC cluster:
If your job script used
bash and that is NOT your
default shell, you
should begin the code section of your script with
Generally, this should be followed by
of whatever modules your job requires.
See the section on using the module command for more information.
It is recommended that you include the relevant module commands for a job in the actual job script, as opposed to relying on modules loaded by your dot files.
For more information than is suitable for a quick start document, follow one or more of the links below:
Now that you have a job script, you need to submit the job
to the cluster with the
sbatch command. For example,
login-1:~: sbatch test.sh Submitted batch job 13222
The number that is returned to you is the identifier for the job, and you should use that anytime you want to find out more information about your job, and you should include this number if you are opening a help ticket about a job.
Do NOT start jobs from your home directory. It is NOT optimized for heavy I/O.
At this point, your job has been placed in the queue, and will wait its turn for resources to be available. Depending on how heavily used the cluster is at that time, and how many resources you are requesting, your job might start within minutes or it might wait for hours or even days. (And this is assuming that there are sufficient funds in the allocation, etc.) See the FAQ for tips on how to reduce the amount of time your job spends waiting in the queue..
Once resources become available, Slurm will assign resources to your job, including one or more cores on one or more nodes. A shell process will start on the first core of the first node assigned, and your script will run. Normally, your script will start any other tasks on the same or on other nodes as needed.
The standard output and standard error streams will be directed
to a file, by default
slurm-NNNN in the directory
where you started the job, where the NNNN is the job number
as described above. See the section on
specifying output options
for more information.
Do NOT start jobs from your home directory. It is NOT optimized for high I/O. Use scratch space instead. See the section on storage for more information on storage options.
Output from your job can be viewed in the above specified file shortly after it starts running (assuming it has output something). This can be used to check the status of your job, although it is advisable if your code generates a lot of output to redirect it to another file. See the section on storage for more information on storage options.
For our trivial example from the last section, when the job completes we should see something like
l:~: cat slurm-13222.out compute-b6-39.zaratan.umd.edu Wed May 21 18:38:06 EDT 2022
As you can see in the output files above, the script ran and printed the hostname and date as specified by the job script.
The basic command for monitoring your jobs' status is the
squeue command. Because normally you are only interested
in your jobs, it is advisable to add the
flags, to speed up the command and only show your jobs. Replace USERNAME
with your username.
For more information on monitoring jobs than is suitable for a quick start document, follow the links below.
It is often useful to be able to see the status of the cluster as a whole, including information about how busy the cluster is at a given point in time.
The squeue command without any arguments will list all jobs in the queue. This can be overwhelming, however, as there are often many, many jobs.
sinfo -N command can show you information about the nodes in
the cluster. Again, this is a dense text output, so can be difficult to
smap uses ascii graphics to present this information in a
more graphical and hopefully more digestible fashion.
sview uses X11 graphics for an even prettier overview of the