Please note that this page is still under construction. Therefore not all policies related to the Zaratan cluster are currently listed here.
The High Performance Computing (HPC) Clusters are part of the information technology resources that the Division of Information Technology makes available to the university community, and as such are covered by the campus Acceptable Use Policy (AUP). All users of the HPC clusters are required to adhere by the campus AUP in addition to the policies specific to the HPC clusters.
You should read and familiarize yourself with the Acceptable Use Policy. The AUP includes the following provisions which might be particularly applicable to users of the HPC clusters, but the list below is NOT complete and you are bound by all of the policies in the AUP.
In addition to the AUP, the HPC clusters have there own policies enumerated in this document. Among these are:
USERNAME@terpmail.umd.edu) email address. You
can read it on a campus mail system, or forward it to another address or email system
which you do read, but that is the address at which system staff will
contact you if we need to, and you are expected to be monitoring it.
The various HPC systems provided by the University are for the use of UMD faculty, students, and staff. If you do not have a current posting with the University and are not currently registered for classes at the University, you are in general not eligible to have an account on any of the UMD provided HPC clusters. This includes researchers who have moved to another university and students who have graduated and are not continuing on at UMD.
Because it is recognized that there are research and academic collaborations between people at UMD and people at other institutions, there is some provision for granting access to UMD resources to persons not formally associated with the University of Maryland when they are working with researchers at UMD. This is through the affiliate process; more information regarding the affiliate process can be found here.
People who once were associated with the University but are not currently associated with UMD (e.g. researchers who have moved on from UMD, students who have graduated from UMD, affiliates who were not renewed) will have there access to the HPC clusters revoked. The exact timing depends on the nature of the former association --- e.g. student accounts will be disabled after two consecutive semesters for which they are not enrolled (i.e. about one year from graduation), accounts for non-student researchers will typically expire between 1 and 6 months after the appointment is terminated, depending on the status of the appointment. Once the account is disabled, access to the clusters will be disabled. In such cases, we ask that you delete any unneeded data from your home and lustre directories, and transfer any data worth saving off the system before your account expires --- any remaining data will be disposed of pursuant to HPC policies.
If you are continuing to work with researchers at UMD and need to retain access to the clusters, you will need to have your UMD colleagues request affiliate status for you.
Access to the various HPC cluster requires a valid allocation to charge jobs against. You will be automatically granted access to the cluster when the designated point-of-contact for a valid allocation on the cluster requests that you be granted access to the allocation. Your access to the cluster will automatically be revoked when you are no longer associated with any valid allocations on the cluster. Your association with an allocation will terminate when any of the following occur:
If the allocation expires, you can try to renew it. Allocations from Engineering should talk to Jim Zahniser; allocations from CMNS should talk to Mike Landavere, and allocations from the Allocations and Advisory Committee (AAC) should follow the instructions for applying for AAC allocations.
In all cases, we ask that you delete any unneeded files from the cluster, and move all files off the cluster before your access is disabled as a courtesy to other users of the clusters. Although any remaining data will be disposed of pursuant to HPC policies, removing the data yourself will free up space on the cluster sooner.
The login nodes are provided for people to access the HPC clusters. They are intended for people to setup and submit jobs, access results from jobs, transfer data to/from the cluster, compiling code, installing software, editing and managing files, etc. As a courtesy to your colleagues, you should refrain from doing anything long running or computationally intensive on these nodes as it will interfere with the ability of others to use the HPC resources. Computationally intensive tasks should be submitted as jobs to the compute nodes (e.g. using sbatch or sinteractive), as that is what compute nodes are for.
Most compilations of code are short and are permissible. If you are doing a very parallel or long compilation, you should consider requesting an interactive job and doing your compilation there as a courtesy to your colleagues.
Compute intensive calculations, etc. are NOT allowed on the login nodes. If system staff find such jobs running, we will kill them without prior notification. Users found in violation of this policy will be warned, and continued violation may result in suspension of access to the cluster.
Do NOT run compute intensive calculations on the login nodes
The Division of Information Technology and the various contributing research groups have provided large amounts of disk space for the support of jobs using the Zaratan HPC Cluster. The following policies discuss the use of this space. In general, the disk space is intended for support of research using the cluster, and as a courtesy to other users of the cluster you should try to delete any files that are no longer needed or being used.
All data on the HPC clusters, including home, scratch, and SHELL filesystems, are considered to be related to your research and not to be of a personal nature. As such, all data is considered to be owned by the principal investigator(s) for the allocation(s) through which you have access to the cluster.
All Division of Information Technology provided scratch filesystems are for the support of active research using the clusters. You must remove your data files, etc. from the cluster promptly when you no longer have jobs on the clusters requiring them. This is to ensure that all users can avail themselves of these resources.
The ONLY filesystems backed up by the Division of Information Technology on the HPC clusters are the homespaces. Everything else might be irrecoverably lost if there is a hardware failure. So copy your precious files (e.g. custom codes, summarized data) to your home directory for safety.
For the purposes of HPCC documentation and policies, the disk space available to users of the cluster is categorized as indicated below.
/scratch/zt1 on Zaratan)
It is provided by the Division of IT and is visible to
all nodes in the cluster. All HPCC
users can access it (although if your research group has its own data volumes,
we request that you use that preferentially.) Research-owned scratch space
is just a reservation of the total scratch space for that research group, so
there is no user-visible difference between storage owned by research groups and
DIT-owned scratch storage. Scratch space is much better optimized for
performance than the home space volumes,
but jobs doing heavy I/O should still
seriously investigate using local temporary space instead.
Scratch storage is not backed up to tape, but
allows for more storage than home space volumes. Still,
remember to store critical data on the home space which is
backed up. Policies related to DIT provided data space.
For Zaratan nodes, this amount is about 1.5TB.
This space is available for use by your job while it
is running; any files left there are deleted when the job ends. This space
is not backed up, and files will be deleted without notice when job ends.
This space is only visible to the node it is attached to; each node of a
multinode job will see its own copy of
/tmp which will differ
/tmp on the other nodes. However, being directly attached,
this space will have significantly better performance than network mounted volumes.
Policies related to local temporary space.
These options are available for the storage of files and data not associated with active research on the cluster (such files should not be stored in scratch filesystems). This is useful for data which needs to be kept but rarely accessed, e.g. after a paper is published, etc. While there is no time limit on how long data can stay in these locations, it is still requested that you delete items after they are no longer needed. Policies related to longer term storage
The SHELL filesystem is the ONLY place provided by the Division of Information Technology for the storage of data not being actively used by computations on the cluster.
The DIT provided scratch space is NOT for archival storage. It is ONLY for the temporary storage of files supporting active research on the clusters. You must remove any data which is no longer needed for jobs you are running on the cluster promptly.
Scratch and SHELL spaces are NOT backed up.
Files in the scratch filesystem are subject to a 90 days purge policy. This means that files older than 90 days will be automatically removed without warning. If you need to keep data longer than this, consider moving it to your SHELL space, or off of the cluster entirely.
wherever it is feasible. This generally offers the best disk I/O performance.
Contact hpcc-help if you have questions or need assistance with that.
/tmp that belong to you will be deleted when
you no longer have any jobs running on the node, it is good practice to delete files yourself
at the end of the job where possible. Especially if you run many small jobs that can share
a node; as otherwise it can take some time for the automatic deletion to occur and that can reduce the
available space in
/tmp for other jobs.
Any files you own under
/tmp on a compute node will be deleted once the last job
of yours running on the node terminates (i.e. when you no longer have any jobs running on the node).
The SHELL volumes are NOT backed up.
Over time, users on the cluster will come and go. Because of the large volume of data that some users have, it is necessary to have policies regarding the disposal of this data when users leave the university or otherwise lose access to the cluster in order to protect potentially valuable data but also prevent valuable cluster resources from being needlessly tied up due to files from users no longer on the cluster.
All active users on the Zaratan cluster belong to one or more allocations, and lose access to the cluster when they no longer are associated with any allocations, be it because they ceased being associated with the University or the research group owning the allocation, or the allocation expired. When this happens: