Frequently Asked Questions about the University's High-Performance Computing Clusters

Introduction

Below are answers to some commonly asked questions about the University of Maryland's high-performance computing (HPC) clusters (HPCCs). More in-depth information on a given topic can be found by following the links to more in-depth information.

I ) Introduction to the Clusters/General Issues

I-1 ) What are the high-performance computing clusters?
I-2 ) Who owns/runs the cluster?
I-3 ) What is the Allocations and Advisory Committee (AAC)?
I-4 ) Where can I find detailed documentation on the clusters?
I-5 ) What are the advantages of joining one of the campus clusters as opposed to starting my own?
I-6 ) How do I contribute to one of the campus clusters?
I-7 ) How can I get help using the clusters?
I-8 ) How should I acknowledge the use of one of the clusters in papers, etc.?
I-9 ) I just received email about my home directory being over quota. What does this mean?

II ) Access to the system

II-1 ) How do I get access to the system?
II-2 ) My research group contributed to one of the clusters. How do I get access to the system?
II-3 ) My advisor or research group already has an allocation from the AAC. How do I get access to the system?
II-4 ) Can I get access to the system, even if I/my research group did not contribute to one of the campus clusters?
II-5 ) What is a TerpConnect/Glue account and how do I get one?
II-6 ) How do I get an associate/colleague/student/etc added to my allocation?
II-7 ) I left the university. What will happen to my access to the cluster/data on the cluster?
II-8 ) Can I access the cluster from a foreign country?

III ) Issues connecting, etc

III-1 ) Can I access the cluster from a foreign country?
III-2 ) I cannot connect to the system. What is wrong?
III-3 ) I cannot transfer files. What gives?
III-4 ) I get 'Cannot connect to server' and/or 'noVNC' errors when trying to launch an interactive application in the OnDemand portal
III-5 ) I am getting warnings about keys and fingerprints when I try to ssh. Should I be concerned?
III-6 ) How do I change my password?
III-7 ) I forgot my password. What can I do?
III-8 ) When I try to login, I get an error like 'Resource temporarily unavailable'. What should I do?
III-9 ) Why was I advised not to use vscode on the HPC systems?

IV ) Slurm issues/error messages/warnings/etc

IV-1 ) What does "sbatch: error: This does not look like a batch script" mean?
IV-2 ) Sbatch errors with "Batch job submission failed: Job violates accounting/QOS policy (job sumbit limit, user's size and/or time limits)". What does that mean?
IV-3 ) What do "(AssocGrpCPUMinsLimit)", "(AssociationJobLimit)" or "(AssocGrpBillingMinutes)" mean?
IV-4 ) My job failed with an error like 'slurmstepd: error: *** JOB NUMBER ON NodeName CANCELLED AT Time DUE TO TIME LIMIT ***'. What does that mean?
IV-5 ) My job failed with an error 'slurmstepd error: Exceeded step memory limit at some point'. What does that mean?
IV-6 ) What does "(QOSResourceLimit)" mean?
IV-7 ) My openmpi job complains about CUDA libraries. What does this mean?
IV-8 ) Why is my job complaining about not finding the 'module' command ?
IV-9 ) What does "Quota" mean in the status field for a job when using the showq command?

V ) Issues running jobs

V-1 ) My job is spending a lot of time in the queue. Why? When will it start?
V-2 ) How can I reduce the amount of time my job spends waiting in the queue?
V-3 ) My batch job finishes as soon as it starts with no/little output
V-4 ) My OpenMPI performances is much less than expected. What gives?
V-5 ) What does "(AssocGrpCPUMinsLimit)" or "(AssociationJobLimit)" mean?
V-6 ) What does "(QOSResourceLimit)" mean?
V-7 ) My openmpi job complains about CUDA libraries. What does this mean?
V-8 ) My OpenMPI job has a warning about 'N more processes have sent' some message. What does that mean?
V-9 ) What does "Quota" mean in the status field for a job when using the showq command?

VI ) Questions about job accounting

VI-1 ) What is an SU? Or a kSU?
VI-2 ) Which allocations do I have access to?
VI-3 ) Which allocation should I charge my job against?
VI-4 ) Why am I receiving an error re disk quota exceeded?
VI-5 ) Why do scratch_quota and du -h not agree?

FAQ I) Introduction to the Clusters/General Issues

FAQ I-1) What are the high-performance computing clusters?

The Division of Information Technology provides several high-performance computing (HPC) clusters for general campus use. These are beowolf clusters consisting of hundreds of x86_64 based compute nodes configured for running large-scale calculations in support of the campus research community. They are especially designed for parallel computation.

The clusters are:

The Zaratan cluster
The Juggernaut cluster

FAQ I-2) Who owns/runs the cluster?

Both the Zaratan and Juggernaut clusters were purchased with funds from the Division of Information Technology at the University of Maryland along with contributions from various colleges, departments and research groups. Contributing groups receive high priority allocations, replenished quarterly, based on the amount of CPU time their contribution added to the cluster.

Both clusters are managed by the Division of Information Technology, and campus IT staff handle all the hardware and system issues, and maintain a large software collection for the users.

FAQ I-3) What is the Allocations and Advisory Committee (AAC)?

The Allocations and Advisory Committee (or AAC for short) is composed of faculty representing colleges and/or departments which have contributed hardware to the Zaratan cluster. This group provides oversight, sets policy, and allocates computational resources to campus researchers.

FAQ I-4) Where can I find detailed documentation on the clusters?

Cluster documentation, including hardware configurations, available software, status, reports, and more are available at /hpcc

Detailed information about each of the various clusters can be found at:

the Zaratan cluster
the Juggernaut cluster

FAQ I-5) What are the advantages of joining one of the campus clusters as opposed to starting my own?

HPC clusters take a significant amount of work to set up, and after the initial procurement and installation, they also take a fair amount of time to maintain. They also need to be housed in spaces capable of supporting their demanding power and cooling needs. Because the Division of IT takes care of all of this for you, you can focus exclusively on your research needs without the added burden of managing your own IT environment.

Joining one of the campus cluster also provides flexibility with regard to running jobs that you might not be able to otherwise. For example, if you have already contributed several nodes and you would like to see if you applications would benefit from greater parallization, you could run a larger multi-core job even your contribution was smaller than that. (You wouldn't want to do that indefinitely since you would likely exhaust your allocation, but you could certainly do it on occasion should the need arise). Similarly, if you need to run a large number of jobs within a short time period to meet a particular deadline, you can "borrow ahead" on your allocation and obtain additional compute power when you need it. When you purchase your own computing environment, you cannot exceed its maximum compute capacity, and idle cycles cannot be reclaimed, as they can due to the flexible allocation scheme provided by the campus clusters.

Researchers who are unsure about HPC and whether it will improve their throughput are encouraged apply for a developmental allocation from the AAC. This will enable you to determine whether the use of HPC resources can benefit your research without having to invest in hardware.

FAQ I-6) How do I contribute to one of the campus clusters?

The Allocations and Advisory Committee (AAC) and the Division of Information Technology can help. Send email to the Division of Information Technology to let us know of your interest . We will discuss your research requirements with you and work with AAC members to determine whether the cluster or high-performance computing in general is appropriate for your type of research. Test allocations are also available to help determine whether and how much your application would benefit from running in a high-performance computing environment.

Once everyone is in agreement that that an investment in the cluster makes sense, we will initiate discussions with the AAC and cluster administrators to iron out the specifications and associated costs of the hardware contribution.

See the section on contributing the the UMD HPC environment for more information about the benefits of contributing to the cluster and how to start a dialog about doing so.

FAQ I-7) How can I get help using the clusters?

The systems staff for the HPC clusters will try to assist you with basic issues of accessing and using the system, but any but the most basic questions regarding the use of the various research software applications, and especially questions invovling details of your field of study, are likely to be beyond our expertise, and yuo are best off directing such questions at your colleagues.

We hope that our usage documentation will answer most questions, and other pages provide further mechanisms for getting assistance.

Basically, for system type questions, you can open a help or trouble ticket, and for application type questions you might get help from our hpcc-discuss mailing list.

FAQ I-8) How should I acknowledge the use of one of the clusters in papers, etc.?

Maintaining a first class HPC environment is expensive, and the Division of Information Technology request that you acknowledge your use of our clusters in papers or other publications of research which benefited from this campus resource. Such acknowledgements assist us in convincing people of the value of this resource to campus, and helps us to obtain funding for its continued maintenance and/or future expansion.

To acknowledge your use of the cluster, we request that you use this wording.

FAQ I-9) I just received email about my home directory being over quota. What does this mean?

Home directories on the Zaratan HPC cluster are not intended for the storage of large data sets. They are located on disks chosen for reliability over speed, so they are not optimized for heavy I/O, and they are backed up. As such, the available space is more limited than in, e.g., lustre data stores.

Because of this, we have a soft quota policy on these directories. You should keep the size of your home directory under 10 GB. However, because we recognize the large data demands of our HPC users, this is not a hard quota; if your usage is at 7 GB and you copy a 5 GB file into your home directory, we don't have a hard limit and kill the transfer when 10 GB is hit. Instead, we allow you to exceed the 10 GB soft quota by reasonable amounts for up to a week. When you are exceeding the soft quota, however, you will receive an email daily informing you of that, and asking you to remedy the matter before the week is up.

This policy is designed to try to give you the flexibility of storing large amounts of data in your home directory temporarily, without taxing the system unduly and interfering with the work of your colleagues on the cluster.

There are two types of email warnings you will get if the usage on your home directory goes over the 10 GB soft quota.

The first occurs while you are still in the 7 day grace period, and has a subject like Friendly notice: Your home directory on zaratan is over quota. This email is to alert you to the fact that you have gone over the 10 GB soft quota, and that your 7 day grace period countdown has started. At this point, you are still in accordance with the policies on the clusters, but you should look into reducing the disk usage in your home directory before the grace period runs out. You should, at your earliest convenience, delete unneeded files, transfer data off the cluster, or move data needed for active research to lustre or other storage. These emails will occur for as long as you are over the 10 GB soft quota.

If you fail to reduce usage by the requested date, you will get the second email, with a subject like URGENT: You are OVER your quota on Zaratan homespace. This is more serious, and if you receive this email you are in violation of cluster policy. If you receive this email, you MUST reduce your homespace usage ASAP.

Unlike the "Friendly notice" emails, the emails when you exceed the grace period go to system staff as well, and if we do not see prompt action to rectify the matter we will contact your advisor and/or suspend your privilege to use the HPC clusters, as you are in violation of policy and negatively impacting the use of the cluster by other users.

FAQ II) Access to the system

FAQ II-1) How do I get access to the system?

There are basically two methods for getting access to the system:

If you or a research group/department/college you are affiliated with has contributed to the Zaratan cluster, you can request access to the allocation of the contributing unit.
If you are NOT affiliated with a unit that contributed to the Zaratan cluster, you can apply to the AAC for access

These are explained in more detail below.

FAQ II-2) My research group contributed to one of the clusters. How do I get access to the system?

Contact the person responsible for your research group's cluster allocation. Your colleagues and/or advisor should be able to direct you to that individual. Have the allocation owner send email to hpcc-help@umd.edu requesting that your account be granted access. The message should contain your name, your University ID, and the name of the allocation group, and the name of the cluster (Zaratan). Requests must come from a recognized point of contact for the allocation; any other requests will be ignored.

An email will be sent to to you within two business days of the request confirming that access to the requested allocation group has been granted. All cluster-related communication is sent to your @umd.edu account, so please monitor all communications and honor any requests from systems staff sent to that address.

FAQ II-3) My advisor or research group already has an allocation from the AAC. How do I get access to the system?

For the Zaratan cluster, this is the same process as for research groups that contributed to the cluster. Basically, have the point of contact submit a request, as described above.

FAQ II-4) Can I get access to the system, even if I/my research group did not contribute to one of the campus clusters?

The Allocations and Advisory Committee (AAC) considers all applications for access to the HPC clusters for general campus use. These requests are most often awarded to researchers investigating HPC computations (e.g. is the application suited to parallelization, how much speed up would it get at various levels of parallelization, etc), or for projects which would benefit from HPC methodology but are limited enough in scale that building a HPCC for the project is not cost effective.

Small, kSU developmental allocations are also available for researchers who wish to investigate whether their research could benefit from HPC resources. This allows you to "test drive" an HPC cluster without the monetary investment.

There is no monetary charge to the applicant for the AAC granted allocations. (NOTE: Software licensing, etc. is NOT included in the allocation, even if mentioned in the application. If restrictively licensed software is required, you must provide the licenses. Please contact the Division of Information Technology BEFORE making any software purchases to ensure your license is compatible with the HPC cluster.)

Please see the section on Requesting an Allocation from the AAC for more information.

FAQ II-5) What is a TerpConnect/Glue account and how do I get one?

TerpConnect is the University of Maryland's Unix environment for students, faculty and staff. It is part of a larger Unix environment (named Glue) maintained by the Division of Information Technology.

To access any of these Unix environments (including the Zaratan HPC environments) you need to have a TerpConnect/Glue account. The username and password for this will be the same as your campus Directory ID and password. Generally your TerpConnect/Glue account should be automatically activated; the only case where that might not occur is with . Generally your TerpConnect/Glue account should be automatically activated; the only case where that might not occur is with some affiliate accounts that have not been granted permission for the required service. In such cases, please contact your sponsor and/or PHR person, and look at the list of required services in order for affiliates to have access to the HPC clusters.

If you are not a member of the University of Maryland (e.g. a faculty member, currently registered student, or staff) you might still be able to get a TerpConnect account if you are working with a faculty member who is willing to sponsor you as an affiliate.

FAQ II-6) How do I get an associate/colleague/student/etc added to my allocation?

DO NOT SHARE YOUR PASSWORD with them, or anyone.

The procedure to follow depends on the cluster.

For the Zaratan cluster:
If you are the owner or point-of-contact for an allocation, you can just send email to hpcc-help@umd.edu requesting that the person be granted access. The message should contain your name and University ID, the name and University ID (e.g. their @umd.edu email address) of the person to be added, the name of your allocation, and the cluster it is on (Zaratan). Requests must come from a recognized point of contact for the allocation; any other requests will be ignored. You can add points of contact to your allocation in the same way, just be sure to state that you wish the person added as a point of contact for the allocation (and not just as an user of the allocation).

Email will be sent to to you within two business days of the request confirming that access to the requested allocation group has been granted. If you do not see such, feel free to follow up to confirm that the request was processed.

Before someone can be added to your allocation, they must have an active TerpConnect/Glue account. These are readily available to members of the University community, and there are procedures to get such for colleagues, etc. not formally affiliated with the university.

FAQ II-7) I left the university. What will happen to my access to the cluster/data on the cluster?

The HPC resources are intended for the use of faculty, staff, and students at the University of Maryland. If your formal association with the University ceases, e.g. you are taking a position at another University or you graduate from the University, your accounts at the University (and thus your access to the cluster) will be disabled. The schedule for disabling of access is somewhat complicated, depending on employment, student and other statuses; see

The referenced knowledge base articles are fairly generic. Once your ability to access services requiring use of your Directory ID and password, you will lose your access to the HPC clusters. Your TerpConnect/Glue account is typically not deleted for a month or two after access is disabled. Your HPC account will be deleted shortly after the TerpConnect/Glue account is disabled; at this time any data in your HPC home or lustre directories will be quarantined for up to 60 days (although they can be deleted earlier with approval from the points-of-contact from the last allocation(s) you had access to). If your access to the cluster is reinstated, we will restore the data if it is still available. If you will not be reinstated but need some of your data and you can find a colleague with HPC access who is willing to assist you in transferring the data, you can contact hpcc-help@umd.edu requesting that the ownership of the data in question be transferred to your colleague. Please be certain to discuss this with your colleague beforehand, and specify the paths you wish to transfer to your colleague (at minimum, specify if it should just be lustre, or just home directory, or both).

If you will be continuing to collaborate with researchers here at UMD and need to retain access to the cluster, you will need to have one of your colleagues at UMD sponsor you for affiliate status. With such status, your association with the University will be continued for one year (which your colleague at UMD can renew annually if a longer association is needed).

If not, your access to UMD resources will be disabled as per University policy, and any files you owned will be disposed of as per HPC policies.

FAQ II-8) Can I access the cluster from a foreign country?

Generally, you can access the UMD HPC resources from foreign countries, but there are a few exceptions. The US Office of Foreign Assets Control (OFAC) enforces economic and trade sanctions against a small number of countries, and access to the UMD HPC resources is not available from these sanctioned countries.

More information on can be found at this IT Impacts of OFAC Sanctions knowledge base article . In particular, the blocking of the Duo multifactor authentication service will prevent accessing Zaratan from the affected countries.

FAQ III) Issues connecting, etc

FAQ III-1) Can I access the cluster from a foreign country?

FAQ III-2) I cannot connect to the system. What is wrong?

Make sure you are trying to connect to the login node for the desired cluster , as described in the table found in the section on logging into the clusters.. Note: if you drop the login part at the front of the login nodes for the Zaratan cluster, that is a different machine which users are NOT allowed to log into.

Accounts will get disabled if your association with the university ends. I.e., if you graduate or stop registering for classes, or your appointment ends. If you are an affiliate, remember that affiliate status needs to be renewed (by your sponsor) annually.

If none of the above explain your issue, then contact us. To help us diagnose and resolve your issue, please include the following information:

The host you are trying to connect to
The exact command you are using. For windows applications, etc., any settings would be helpful.
The exact error messages if any
Your username. DO NOT INCLUDE PASSWORDS
As accurately as possible, the time of the failed login attempts
If possible, the IP address you are connecting from. The web page http://noc.net.umd.edu/cgi-bin/netmgr/whoami can provide this last bit of information.

FAQ III-3) I cannot transfer files. What gives?

The scp and sftp protocols are very sensitive to spurious output from your initialization scripts. If you can ssh into the box, notice if you are seeing any errors or unusual output. You should only see the "Unauthorized access ..." warning, the "Last login message", and perhaps a message starting like "DISPLAY is ...". If you see anything else, it is likely interfering with the scp and/or sftp programs, and you should edit your initialization scripts. See also the section on suppressing output from dot files.

FAQ III-4) I get 'Cannot connect to server' and/or 'noVNC' errors when trying to launch an interactive application in the OnDemand portal

The VNC connection used for displaying the GUI in OnDemand portal is rather dependend on the browser being used.

In particular, we have seen issues when trying to connect using the Chrome browser on Windows workstations. We recommend that Windows users use Firefox when running interactive applications in the OnDemand portal.

FAQ III-5) I am getting warnings about keys and fingerprints when I try to ssh. Should I be concerned?

The ssh protocol tries to protect you against a number of different threats. There are two possible warnings; that ssh cannot verify the key, or that the key changed. The first is normal, especially on your first login attempt from a system. The latter could signal an attack.

The section on Logging into the System gives more information, including showing samples of both such messages, and providing the key fingerprints for the login nodes of the Zaratan cluster so that you can manually verify the server in the former case.

FAQ III-6) How do I change my password?

The Zaratan cluster uses your standard UMD campus directory ID and password. Information for changing this password can be found in the password change knowledge base article . This includes pictures and a video. You can also use the passwd command from one of the login nodes. NOTE: The aforementioned procedures will change your password on ALL university systems, as they are all part of one common authentication process.

EXCEPTION: If you are using the Zaratan clusters as part of a class via a temporary Glue class account (e.g. something like cmsc622-10xu account name), then your class account is distinct from your normal UMD campus directory ID. If you know the password, you can change the password using the standard Unix passwd command; when logged into the class account, just type passwd. It will prompt you for your current password, then ask you to enter the new password twice. If you forgot your password, you will need to ask your instructor to reset it for you (Instructors can find more information in this regard in the class access section of this documentation.)

FAQ III-7) I forgot my password. What can I do?

Since the Zaratan cluster uses your standard UMD campus directory ID and password, password resets are handled the same way as for this campus-wide password. Information for resetting this password can be found in the password reset knowledge base article . This includes pictures and a video. NOTE: The aforementioned procedures will reset your password on ALL university systems, as they are all part of one common authentication process.

EXCEPTION: If you are using the Zaratan cluster as part of a class via a temporary Glue class account (e.g. something like cmsc622-10xu account name), then your class account is distinct from your normal UMD campus directory ID. In this case, you will need to ask your instructor to reset it for you. Instructors can find more information in this regard in the class access section of this documentation.

FAQ III-8) When I try to login, I get an error like 'Resource temporarily unavailable'. What should I do?

If you see errors like "main: fork: retry: Resource temporarily unavailable" or similar "Resource temporarily unavailable" errors when attempting to log into the cluster, it means that you have two many processes and/or threads running on the login node. We impose a limit on the number of processes and/or threads any given user is allowed on the login nodes to prevent accidental or intentional misuse of the login nodes. You might also encounter these or similar errors when issuing commands on the login nodes.

This error usually arises because you have stale processes from misbehaving programs that you ran previously --- a major offending is the vscode application. We strongly recommend users not to use vscode or other applications known to leave behind stale processes.

If you see this error when attempting to run a particular command on the login node, but not on all or almost all commands, then it is likely the command you are trying to run is not suitable for running on the login node, and should not be run there. You should either submit it as a batch job, or if interactive access is needed, request an interactive desktop in the OnDemand portal and run it there. Either of those options will run the command on a compute node, and we do not impose per-user process/thread limits on the compute nodes.

If this is occuring while you are logged in, you can try to find and kill the stale and/or offending processes. You can use the command ps -vf to see your processes, and you can use the command kill PID to kill the problematic processes, replacing PID with the process identifier given in the first (PID) column of the ps output. You can use the last (COMMAND) field of the ps output to identify what the process it. Again, you need to do this on bothlogin nodes.

If you do not have any important processes on the login node, you can use the command pkill -u $USER to kill all of your processes on that node (this will also kill the shell from where you run the command, effectively logging you out). Again, you need to do this on bothlogin nodes.

If you cannot login, then the above will not work. In that case, or if you are unable to kill the problematic processes yourself, you will need to contact system staff and ask them to delete your stale or otherwise offending processes.

That is only a temporary fix --- if you continue to use programs which have a tendency to leave behind stale processes, you will eventually hit the per-user process/thread limit again unless you proactively and frequently take steps above to check for and delete these stale processes. Again, HPC staff strong recommends you just avoid using the problematic software.

FAQ III-9) Why was I advised not to use vscode on the HPC systems?

Vscode is notorious for leaving behind stale processes and threads. This is a serious problem on multiuser systems like the DIT managed HPC clusters. To prevent assorted misuse of the login nodes, we impose per-user process/thread limits on the login nodes. Because of this, users of vscode are likely to run into errors about resources being unavailable as the stale processes count toward the per-user process limit. At best this leads to annoyances, but frequently it also leads to users being temporarily locked out of the cluster until system staff kill the user's stale processes.

These issues are not issues with the cluster, but due to the problematic behavior of the vscode product. While we do not prohibit the use of vscode on the cluster, we strongly recommend against it. Use vscode on your personal workstation if you so desire, but we recommend against using the vscode tools to login and/or copy files to the cluster --- use standard ssh/scp and similar non-vscode commands to login and transfer files.

If you insist on using vscode despite our recommendations to the contrary, you must regularly and frequently check for and kill stale vscode processes. The basic process is discussed in the FAQ about the resource unavailable errors. Failure to do this diligently could result in the stale processes exceeding the per-user process threshold again, which will temporarily prevent your ability to log into or use the cluster, forcing you to submit a ticket requesting system staff clean up your stale processes. While system staff will do so, please note that such an issue does not constitute an emergency and will generally only be dealt with at standard priority during normal business hours. Again, we strongly recommend you refrain from using vscode on the DIT maintained HPC clusters.

It is also recommended that before you use VScode again (after killing all vscode processes), that you turn off the TypeScript (TS) and JavaScript (JS) features on your VScode installation by following these instructions. Turning off TS and JS seems to greatly reduce if not eliminate the stale process issues, although we still advise you to monitor for and delete stale processes when found. Also, be sure to quit your VScode sessions completely before closing your laptop, logging out, etc. as not doing such may also lead to orphaned processes.

FAQ IV) Slurm issues/error messages/warnings/etc

FAQ IV-1) What does "sbatch: error: This does not look like a batch script" mean?

The Slurm sbatch command requires that your job scripts start with a shebang line, that is a line beginning with #! followed by the path to the shell to be used to process the script. For example, if you have a script written in tcsh shell ( with the set and setenv commands, etc.), your script should start like:

#!/bin/tcsh
#SBATCH -n 1
#SBATCH -t 30:00

setenv WORKDIR $SLURM_SUBMIT_DIR
...

A similar script in the bourne shell would be

#!/bin/bash
#SBATCH -n 1
#SBATCH -t 30:00

. ~/.profile
WORKDIR=$SLURM_SUBMIT_DIR
export WORKDIR

Slurm requires the shebang line, because it searches for that and uses that to determine which shell it should use to run your script. (This differs from the PBS/Torque/Moab/Maui behavior, which ignored the shebang and just ran the script under your default shell unless your specified another shell with a flag to qsub

This error is telling you that Slurm could not find a valid shebang line. Generally, you just need to figure out what shell you were using (if you see setenv commands or variable assignments beginning with set, use /bin/tcsh, if you see export lines or variable assignments without the set command, you probably want /bin/bash).

FAQ IV-2) Sbatch errors with "Batch job submission failed: Job violates accounting/QOS policy (job sumbit limit, user's size and/or time limits)". What does that mean?

Although this cryptic error message can mean a number of things, most often on the Zaratan cluster it means that the allocation account you are trying to charge the job against is out of funds.

Normally, if there are insufficient funds for the completion of your job and all currently running jobs charging against the same allocation account, sbatch will accept the job and place in the queue, and it will simply refuse to run until sufficient funds are available (either due to replenishment or due to currently running jobs using less SUs than anticipated by the scheduler). These jobs will remain queued with the reason "AssocGrpCPUMinsLimit", "AssociationJobLimit" or "AssocGrpBillingMinutes".

However, if the anticipated cost (based on walltime limit and CPU cores requested) of the job you are trying to submit exceeds the limit on the allocation account, the job will refuse to even be queued with the error

Batch job submission failed: Job violates accounting/QOS policy (job sumbit limit, user\'s size and/or time limits)

If the allocation account you are charging the job against is not nearly empty (see the sbalance command), then please contact system staff. Please include the exact sbatch command you gave, and the path to the job submission script.

FAQ IV-3) What do "(AssocGrpCPUMinsLimit)", "(AssociationJobLimit)" or "(AssocGrpBillingMinutes)" mean?

If you see one of (AssocGrpCPUMinsLimit), (AssocGrpBillingMinutes) or (AssociationJobLimit) in the NODELIST(REASON) field of the squeue output for your job, that means that the job is pending because there are insufficient funds in the account that the job is being charged against. If you are using the showq command, this manifests as a Quota message in the State column.

(Actually, there are in general a number of different factors that could be causing this message in Slurm, but on the Zaratan cluster the only one which is relevant is the amount of SUs granted to the account you are charging against.)

Note that for a job to start running, we require that the account has sufficient funds to complete it and all currently running jobs (the amount of funds required for completion is computed based on the amount of walltime requested for the jobs). So even if your account has 3kSU remaining and your job only would consume 0.5 kSU, if you (or others in your group) have other jobs currently running that are anticipated to need more than 2.5 kSU to finish, your new job will not run and be left pending with this status. If those jobs finish before expected, so that there is now enough funds for your job, it will be started when the scheduler nexts examines it.

FAQ IV-4) My job failed with an error like 'slurmstepd: error: *** JOB NUMBER ON NodeName CANCELLED AT Time DUE TO TIME LIMIT ***'. What does that mean?

Every job has a time limit. You are strongly encouraged to explicitly state a time limit in your job (see the section on specifying the amount of time your job will run for more information) as the default time limit is rather small (about 15 minutes). If your job runs past the amount of time that was given to it, it will be killed with an error message like the above.

The time limit for the job is needed for the scheduler to efficiently schedule jobs. Your job will spend less time in the queue if you give a good value for this --- you need to specify a time in which you are sure the job will complete within (because otherwise it will be killed), and you might want to add some modest padding to that time just to be safe. But you do not want to be excessive, either. If you expect your job will only run for an hour, specifying a walltime of 2 hours is not unreasonable (giving it some padding), but 10 hours is excessive and may delay the start of your job. E.g., if the next job in the queue to be scheduled needs 10 nodes, but only 8 nodes are currently free, and the scheduler estimates the remaining two nodes will only become available in 2.5 hours, it might decide to let your 2 hour job run on some of the free nodes since your job will be finished before the 10 node job will need them. But if you specified 10 hours wall time, your job will not fit into that window.

FAQ IV-5) My job failed with an error 'slurmstepd error: Exceeded step memory limit at some point'. What does that mean?

Slurm monitors the memory usage of your jobs, and will cancel you job if it uses more memory than it requested. This is necessary because memory is a shared resource and if your job tries to use more memory than was allocated to it, this could negatively impact other jobs on the same node. This error indicates that at some point, your job used more memory than was allocated to it.

If you are getting this error, you can try increasing the amount of memory that you request. The standard Zaratan nodes all have 4 GiB of RAM per core, for 512 GiB per node. There are also a small number of nodes with 2048 GiB per node on the Zaratan cluster. See the section on specifying memory requirements of your job for more information.

You should try to do an estimate of how much memory your code will need to ensure that you are not hitting a memory leak which will consume however much memory you throw at the job.

FAQ IV-6) What does "(QOSResourceLimit)" mean?

If you see (QOSResourceLimit) in the NODELIST(REASON) field of the squeue output for your job, that means that the job has hit against a limit imposed at the QoS level. On the Zaratan cluster, this usually would mean that you have exceeded the maximum number of jobs by a single user that can run at a given QoS level at the same time.

Some users on the cluster legitimately submit hundreds or thousands of single core jobs at the same time which can run for several days. These jobs can have the adverse affect of blocking more parallel jobs from running. To try to balance this, we have imposed limits on the number of jobs from a given user that can be simultaneously running at a given QoS level. You can submit jobs over this limit, but they will remain in pending states (with a QOSResourceLimit in the NODELIST(REASON) field of the squeue command) until additional run slots are available (i.e. one of your currently running jobs completes).

The exact number of this limit is still subject to some tweaking as we try to find the best number to ensure the cluster can be used well by all of our diverse user base. Currently, this limit is in the thousands, so most users will not be impacted by this at all.

FAQ IV-7) My openmpi job complains about CUDA libraries. What does this mean?

Many OpenMPI jobs may see a warning like the following near the start of their Slurm output file:

--------------------------------------------------------------------------
The library attempted to open the following supporting CUDA libraries,
but each of them failed.  CUDA-aware support is disabled.
libcuda.so.1: cannot open shared object file: No such file or directory
/usr/lib64/libcuda.so.1: cannot open shared object file: No such file or directory
If you are not interested in CUDA-aware support, then run with
--mca mpi_cuda_support 0 to suppress this message.  If you are interested
in CUDA-aware support, then try setting LD_LIBRARY_PATH to the location
of libcuda.so.1 to get passed this issue.
--------------------------------------------------------------------------

These are generally harmless warnings. The first message is stating that one of the MPI threads was not able to open certain CUDA related libraries. CUDA is an API primarily used for interfacing with GPUs --- unless your code was specifically designed to use GPUs and you wanted to use GPUs, you can ignore this message. This warning typically shows up on non-GPU jobs because they usually run on nodes that do not have GPUs and therefore do not have the hardware specific CUDA libraries being referred to.

If you were planning to use GPUs and you get this warning, then there is a problem. Most likely you did not request the scheduler to assign you GPU enabled nodes. If that is not the case, contact systems staff.

As stated in the warning message, if you are not planning on using GPUs and you wish to suppress the error message, you can add the flags --mca mpi_cuda_support 0 to your mpirun or equivalent command to turn this warning off.

You might also wish to see the question regarding messages about error/warning aggregation.

FAQ IV-8) Why is my job complaining about not finding the 'module' command ?

This typically happens when you are running a job in a shell which is different than your default login shell. E.g., your default shell is 'tcsh' (which will be the case unless you manually changed it at some point) and you are running a bash job script.

In scenarios of the type described above, your initialization dot files do not get invoked before Slurm starts the job script on the first compute node assigned to your job. Therefore, certain initializations which you tend to expect (e.g. the definition of the "module" command) do not occur, resulting in this error.

To resolve this is relatively simple, just explicity invoke the appropriate initialization scripts at the start of your job script (right after the last "#SBATCH" line), e.g.

#!/bin/bash
#SBATCH -t 1:00
#SBATCH -n 8
#SBATCH --mem-per-cpu=4096
#	 any other SBATCH directives you need/desire 
#
. ~/.profile

#	 the rest of your job script

FAQ IV-9) What does "Quota" mean in the status field for a job when using the showq command?

Please see the question about "(AssocGrpCPUMinsLimit)". That condition manifests with the Quota message with the showq command.

FAQ V) Issues running jobs

FAQ V-1) My job is spending a lot of time in the queue. Why? When will it start?

The HPC clusters have a large number of nodes, and many, many cores. However, we also have many users, and some of them submit very large jobs. We make no promises about the maximum amount of time a job can be queued. If the system is lightly loaded, most jobs will start within a few minutes of being submitted (there is some small overhead to the batch system). When the cluster is heavily used, wait times of hours to a significant fraction of a day are not to be unexpected.

If your job is taking a while to run, first check its status with the squeue command. Pay attention to the NODELIST(REASON) field. If it shows (Resources) or (Priority), that means that either there are no nodes available for it to run on (i.e. other jobs are running on the nodes it needs), or that there are higher priority jobs before it in the queue. These conditions usually indicate that the cluster is busy, and that it will start your job when resources are available and it is your job's turn. See the section on getting an estimate of when your job will start for more information.

If the NODELIST(REASON) field is showing something else, there might be something wrong which is preventing your job from starting. Typical cases are (AssociationJobLimit) and (QOSResourceLimit) which indicate that you have exceeded either your allocations funds or the maximum number of allowed jobs which a single user can run at one time.

FAQ V-2) How can I reduce the amount of time my job spends waiting in the queue?

Please see the following for a more detailed overview of the scheduling process, and the factors that affect scheduling. But for a quick answer, to minimize the amount of time your jobs spend in the queue:

For short (under 15 minute) debugging jobs, use the debug partition.
Do NOT use the scavenger partition, as it has the lowest priority of all partitions.
Do not request an excessive walltime. Your job will be killed, finished or not, when the walltime runs out, so a bit of padding is recommended, but be reasonable. If your job should finish within 4 hours, don't request a walltime of 8 hours.
While you should always request the resources you expect to need, and maybe with some padding since these are often estimates, do not be excessive. E.g., requesting 3 GB/core of RAM when in reality only 2 GB/core are needed can result in your job spending more time in the queue.

FAQ V-3) My batch job finishes as soon as it starts with no/little output

While these symptoms can have a number of causes, frequently the cause is quite simple --- the last line in your job script is missing a proper Unix end-of-line (EOL) character. This can often happen if you edit the script on Windows and then transfer to Unix, as the standard EOL characters differ on the two OSes.

When your script does not have a proper EOL terminating the last line, the shell ignores/does not execute the last line. As often the last line is the line wherein you issue the command which is going to do all the real work, this often means that your job starts, does some simple housekeeping commands (loading modules, change working directory, etc), then skips the last line (with the missing EOL), which would have done the real work, and exits (successfully) almost immediately afterwards.

While there are various ways to ensure the file ends with a proper EOL, the easiest solution is to just ensure that your script ends with a couple of blank lines. That does not actually fix the EOL issue, but does mean that if the final EOL somehow is missing, the line that is ignored is blank, and so you do not care if it is ignored.

FAQ V-4) My OpenMPI performances is much less than expected. What gives?

FAQ V-5) What does "(AssocGrpCPUMinsLimit)" or "(AssociationJobLimit)" mean?

See the answer to this question in a different section.

FAQ V-6) What does "(QOSResourceLimit)" mean?

See the answer to this question in a different section.

FAQ V-7) My openmpi job complains about CUDA libraries. What does this mean?

This is likely a harmless warning. See this question and answer for more details.

FAQ V-8) My OpenMPI job has a warning about 'N more processes have sent' some message. What does that mean?

OpenMPI job output will sometimes include output similar to this:

[compute-b8-48.zaratan.umd.edu:38553] 20 more processes have sent help message help-mpi-common-cuda.txt / dlopen failed
[compute-b8-48.zaratan.umd.edu:38553] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

This message is basically stating that a number of other (20 in this case) MPI tasks have generated the same error. OpenMPI normally assumes you only want to see errors and warnings once per job, not once per MPI task. So if more than one MPI task produces basically the same error message, it prints the error message once, and then prints a message like the one above to let you know that the message occurred multiple times. This is usually what you want --- typically the message occurs on every MPI task in the job, and most people do not want to be inundated with the same error message hundreds or thousands of times.

This message usually can just be ignored, as it is just saying that the previous warning/error message occurred multiple times. Instead, you should focus on previous warning/error message.

As stated in the warning, if you add the flag --mca orte_base_help_aggregate 0 to your mpirun command, OpenMPI will not aggregate messages and you will see the messages from each and every MPI task that generated them.

FAQ V-9) What does "Quota" mean in the status field for a job when using the showq command?

See the answer for AsscGrpCPUMins.

FAQ VI) Questions about job accounting

FAQ VI-1) What is an SU? Or a kSU?

Accounting on the HPC clusters is done in units of Service Units, typically abbreviated as SUs. One SU is equal to 1 hour of wall time of a single core of a CPU. So, a job running for 4 hours on all the cores of 3 nodes, with each node having two 8-core CPUs, would accumulate a charge of 4 hours * 3 nodes * 2 CPUs/node * 8 cores/CPUs = 192 core-hours = 192 SU.

Both job charges and the funds in allocations are measured in SUs. The actual low-level accounting is done in core-seconds, but those numbers are unwieldy. Indeed, even SUs are unwieldy for many purposes, and we often talk in terms of kSUs, with 1 kSU = 1000 SU.

See the section on allocations and job accounting for more information.

FAQ VI-2) Which allocations do I have access to?

The sbalance with no arguments will list the balances for all allocations which you have access to on the Zaratan cluster.

If you see any allocations which you do not believe you should have access to, please contact us. If you believe you should have access to allocations which are not listed, contact the point of contact/owner of the allocation and have them request that you be granted access.

FAQ VI-3) Which allocation should I charge my job against?

If you are asking this question, presumably you have more than one allocation to which you have access. If not, you do not have a choice, and the system will automatically charge against the single allocation you have access to.

If the research groups, etc. that you belong to have local policies on which allocation you should charge, those take precedence over the advise in this FAQ. We are just providing some useful guidelines in the absence of any policies set by those in charge of the particular allocations.

If you belong to more than one project then you should choose the project that best fits the work that the job you are submitting. I.e., if you belong to an allocation for Dr. Jones and one for Dr. Smith, work for Dr. Jones should be charged against Dr. Jones's allocation and not Dr. Smith's allocation, and vica versa.

See the section on specifying the account to be charged for more information on specifying which allocation account your job should be charged against.

See the section on the allocation replenishment process, for more information, which might help clarify the above statements.

FAQ VI-4) Why am I receiving an error re disk quota exceeded?

To promote fair and efficient use of the disk space provided on the Zaratan cluster, we have quotas on all networked filesystems on the cluster. Due to the different underlying hardware and filesystems being used, the quotas and technical details behind the quota numbers vary depending on the system.

Home directory: Home directories have a strict quota of 10 GB per user. We do not increase this. Use the command home_quota to view your home directory quota and usage. More information can be found at the relevant page in the main usage documentation.
Scratch space: Scratch space is a high-performance file system intended for the storage of data pertaining to active jobs ( jobs that just completed, jobs that are running or in the queue, or jobs that are planned to be submitted shortly). Data should be deleted when no longer needed, and moved elsewhere (e.g. SHELL) if it should be retained but is no longer needed for active jobs. Use the scratch_quota command to see quota and usage, and see the main documentation for more information. NOTE: Scratch space is not backed up.
SHELL space: This is for medium term storage of data. Use the shell_quota command to view quota and usage stats. See the main SHELL documentation for more information, especially the pecularities of quotas on this storage tier. NOTE: SHELL space is not backed up.

If you are receiving errors about exceeding storage quotas, please use the above *_quota commands and the reported file paths to determine which filesystem is the source of the issue. Then delete unneeded files and if needed move files and data elsewhere (e.g. off the cluster, to SHELL, etc) as appropriate. If additional scratch or SHELL space is needed, you can request addtional storage from:

the campus Allocations and Advisory Committee (AAC)
your College or Department if your college/department has a resource pool to distribute.
purchase additional space from DIT. Our standard pricing list is online.

FAQ VI-5) Why do scratch_quota and du -h not agree?

If you are using the Unix du command to check on your scratch disk usage, you might notice that there are significant discrepencies in usages reported. This is do to a number of reasons, including (but not limited to) the fact that:

the scratch_quota command uses SI (base-10) units like kB, GB, TB whereas the du command usually defaults to powers of 2 based units like kiB, GiB, TiB . The difference between 1 TiB and 1 TB is almost 10%.
The actually accounting of usage differs between the du command and the underlying filesystem (on which the quota mechanism and the scratch_quota command acts). This includes factors like the accounting for partially filled blocks,

To get more accurate numbers reported by the du command>, we recommend using the flags --si --apparent-size to your du command. The former causes du to output using SI units, the latter causes the accounting algorithm to much more closely match what is used by the underlying filesystem.

Frequently Asked Questions about the University's High-Performance Computing Clusters

Introduction

Table of Contents