--> Skip to main content

Policies relating to the Zaratan High-Performance Computing Cluster

Please note that this page is still under construction. Therefore not all policies related to the Zaratan cluster are currently listed here.

Table of Contents

  1. General Policies
    1. Access for non-UMD persons
    2. Access requires a valid HPC allocation
  2. Policies on Usage of Login Nodes
  3. Policies on Usage of Disk Space
    1. Policies Regarding Home Space
    2. Policies Regarding Division-Provided Data Volumes
    3. Policies Regarding Research Group Provided Data Volumes
    4. Policies Regarding Locally Attached Scratch Space
    5. Policies Regarding Data of Former Users

General Policies on Usage of High Performance Computing Clusters

The High Performance Computing (HPC) Clusters are part of the information technology resources that the Division of Information Technology makes available to the university community, and as such are covered by the campus Acceptable Use Policy (AUP). All users of the HPC clusters are required to adhere by the campus AUP in addition to the policies specific to the HPC clusters.

You should read and familiarize yourself with the Acceptable Use Policy. The AUP includes the following provisions which might be particularly applicable to users of the HPC clusters, but the list below is NOT complete and you are bound by all of the policies in the AUP.

In addition to the AUP, the HPC clusters have there own policies enumerated in this document. Among these are:

Access for non-UMD persons

The various HPC systems provided by the University are for the use of UMD faculty, students, and staff. If you do not have a current posting with the University and are not currently registered for classes at the University, you are in general not eligible to have an account on any of the UMD provided HPC clusters. This includes researchers who have moved to another university and students who have graduated and are not continuing on at UMD.

Because it is recognized that there are research and academic collaborations between people at UMD and people at other institutions, there is some provision for granting access to UMD resources to persons not formally associated with the University of Maryland when they are working with researchers at UMD. This is through the affiliate process; more information regarding the affiliate process can be found here.

People who once were associated with the University but are not currently associated with UMD (e.g. researchers who have moved on from UMD, students who have graduated from UMD, affiliates who were not renewed) will have there access to the HPC clusters revoked. The exact timing depends on the nature of the former association --- e.g. student accounts will be disabled after two consecutive semesters for which they are not enrolled (i.e. about one year from graduation), accounts for non-student researchers will typically expire between 1 and 6 months after the appointment is terminated, depending on the status of the appointment. Once the account is disabled, access to the clusters will be disabled. In such cases, we ask that you delete any unneeded data from your home and lustre directories, and transfer any data worth saving off the system before your account expires --- any remaining data will be disposed of pursuant to HPC policies.

If you are continuing to work with researchers at UMD and need to retain access to the clusters, you will need to have your UMD colleagues request affiliate status for you.

Access requires a valid HPC allocation

Access to the various HPC cluster requires a valid allocation to charge jobs against. You will be automatically granted access to the cluster when the designated point-of-contact for a valid allocation on the cluster requests that you be granted access to the allocation. Your access to the cluster will automatically be revoked when you are no longer associated with any valid allocations on the cluster. Your association with an allocation will terminate when any of the following occur:

If the allocation expires, you can try to renew it. Allocations from Engineering should talk to Jim Zahniser; allocations from CMNS should talk to Mike Landavere, and allocations from the Allocations and Advisory Committee (AAC) should follow the instructions for applying for AAC allocations.

In all cases, we ask that you delete any unneeded files from the cluster, and move all files off the cluster before your access is disabled as a courtesy to other users of the clusters. Although any remaining data will be disposed of pursuant to HPC policies, removing the data yourself will free up space on the cluster sooner.

Policies on Usage of Login Nodes

The login nodes are provided for people to access the HPC clusters. They are intended for people to setup and submit jobs, access results from jobs, transfer data to/from the cluster, compiling code, installing software, editing and managing files, etc. As a courtesy to your colleagues, you should refrain from doing anything long running or computationally intensive on these nodes as it will interfere with the ability of others to use the HPC resources. Computationally intensive tasks should be submitted as jobs to the compute nodes (e.g. using sbatch or sinteractive), as that is what compute nodes are for.

Most compilations of code are short and are permissible. If you are doing a very parallel or long compilation, you should consider requesting an interactive job and doing your compilation there as a courtesy to your colleagues.

Compute intensive calculations, etc. are NOT allowed on the login nodes. If system staff find such jobs running, we will kill them without prior notification. Users found in violation of this policy will be warned, and continued violation may result in suspension of access to the cluster.

Do NOT run compute intensive calculations on the login nodes

Policies on Usage of Disk Space

The Division of Information Technology and the various contributing research groups have provided large amounts of disk space for the support of jobs using the Zaratan HPC Cluster. The following policies discuss the use of this space. In general, the disk space is intended for support of research using the cluster, and as a courtesy to other users of the cluster you should try to delete any files that are no longer needed or being used.

All data on the HPC clusters, including home, scratch, and SHELL filesystems, are considered to be related to your research and not to be of a personal nature. As such, all data is considered to be owned by the principal investigator(s) for the allocation(s) through which you have access to the cluster.
All Division of Information Technology provided scratch filesystems are for the support of active research using the clusters. You must remove your data files, etc. from the cluster promptly when you no longer have jobs on the clusters requiring them. This is to ensure that all users can avail themselves of these resources.
The ONLY filesystems backed up by the Division of Information Technology on the HPC clusters are the homespaces. Everything else might be irrecoverably lost if there is a hardware failure. So copy your precious files (e.g. custom codes, summarized data) to your home directory for safety.

For the purposes of HPCC documentation and policies, the disk space available to users of the cluster is categorized as indicated below.

The SHELL filesystem is the ONLY place provided by the Division of Information Technology for the storage of data not being actively used by computations on the cluster.

A list of all data volumes

Policies on Usage of Home Space

  1. Do NOT start jobs from your home directory or subdirectories underneath it. Run the jobs from the scratch filesystem.
  2. Jobs should not perform significant I/O to/from homespace volumes. Use the scratch filesystem, or the locally attached temporary space(/tmp).
  3. Delete or move off the HPCC any files which are no longer needed or used.
  4. There is a 10 GB soft quota on home directories. This soft quota will not prevent you from storing more than 10 GB in your home directory, however, a daily check of disk usage will be performed and if you are above the quota you will receive an email requesting that you reduce disk usage within a grace period of 7 days. The email reminders will continue until usage is reduced or the grace period is over. If you are still overquota at that time, system staff will be notified and more severe emails will be sent, and unless the situation is remedied prompty system staff may be forced to take action, which could involve relocating or delting your files. This soft quota approach is being taken to ensure all HPCC users get fair access to this critical resource without unduly impacting performance on the cluster and allowing you some flexibility if you need to exceed the 10 GB limit for a few days.

Policies on Usage of Division of Information Technology Provided Data Space

  1. Delete or move off the HPCC any files which are no longer needed or used. This space is intended to provide temporary storage for the support of jobs running on the system; it is not for archival purposes. Files which are not actively being used by computations on the cluster must be removed prompty to ensure these resources are available for other users.
  2. Scratch filesystems are subject to a 90 days purge policy. Any files older than 90 days will be automatically removed without warning.
  3. Files in scratch or SHELL storage are not backed up.
The DIT provided scratch space is NOT for archival storage. It is ONLY for the temporary storage of files supporting active research on the clusters. You must remove any data which is no longer needed for jobs you are running on the cluster promptly.
Scratch and SHELL spaces are NOT backed up.
Files in the scratch filesystem are subject to a 90 days purge policy. This means that files older than 90 days will be automatically removed without warning. If you need to keep data longer than this, consider moving it to your SHELL space, or off of the cluster entirely.

Policies on Usage of Locally Attached Temporary Space

  1. Please have your jobs use locally attached temporary space (/tmp) wherever it is feasible. This generally offers the best disk I/O performance. Contact hpcc-help if you have questions or need assistance with that.
  2. Files in locally attached temporary space are not backed up.
  3. Files in locally attached temporary space are deleted upon termination of the job.
  4. Although all files in /tmp that belong to you will be deleted when you no longer have any jobs running on the node, it is good practice to delete files yourself at the end of the job where possible. Especially if you run many small jobs that can share a node; as otherwise it can take some time for the automatic deletion to occur and that can reduce the available space in /tmp for other jobs.
Any files you own under /tmp on a compute node will be deleted once the last job of yours running on the node terminates (i.e. when you no longer have any jobs running on the node).

DIT-provided longer term storage

  1. The SHELL volumes and Google's G drive are the ONLY DIT-provided storage where it is permissible to store files and data not associated with active research on the cluster. It can be used to archive data e.g. that needs to be kept for a while after a paper is published.
  2. The SHELL volumes are only available from the login nodes of the Zaratan cluster, or to external clients. They are NOT available from the compute nodes.
  3. Do not use this storage for active jobs.
  4. These volumes are NOT backed up.
  5. Google's G drive storage is NOT on campus, and as such there may be restrictions on what types of data is allowed to be stored there (from a security perspective). Please see the Google drive service catalog entry for more information regarding this.
    The SHELL volumes are NOT backed up.

    Policies Regarding Data of Former Users

    Over time, users on the cluster will come and go. Because of the large volume of data that some users have, it is necessary to have policies regarding the disposal of this data when users leave the university or otherwise lose access to the cluster in order to protect potentially valuable data but also prevent valuable cluster resources from being needlessly tied up due to files from users no longer on the cluster.

    Disposal of Former User Data on the Zaratan Cluster

    All active users on the Zaratan cluster belong to one or more allocations, and lose access to the cluster when they no longer are associated with any allocations, be it because they ceased being associated with the University or the research group owning the allocation, or the allocation expired. When this happens:

    1. All of the data owned by the user (both in their home directory and/or in their scratch or SHELL directories) is "quarantined". I.e. it is relocated and access to this data is disabled for all users, but it is still consuming space on the filesystem. This is to ensure anyone who is using this data, whether cognizant of their use of it or not, should quickly notice that it is gone, and so hopefully things can be resolved before the data is permanently deleted. If you need (or think you need) access to data from someone whose access was recently disabled,
    2. For every allocation the user who previously owned the data was in just before the user was disabled, the point-of-contacts (PoCs) of those allocations will receive email informing them that the data previously owned by that user is slated for deletion. Emails will be repeated monthly as long as the data remains in "quarantined" (or the PoC gives approval for early deletion of the data).
    3. NOTE: Only the PoCs for allocations that the user belonged to just before being disabled will receive these notifications. E.g., if an user is a member of allocations AllocA and AllocB, and then is removed from AllocA (typically at the request of a PoC for AllocA), and then some weeks or months later is removed from AllocB (either due to expiration of account or at request of a PoC), only PoCs for AllocB will receiving recurring emails about the "quarantined" data. PoCs for AllocA will receive email at the time that the user was removed from AllocA, and this email will mention that they should make arrangements regarding transferal of ownership of any data belonging to AllocA, but no data is "quarantined" at that time (because the user still has access to the cluster).
    4. They data will remain in "quarantine" until one of the following conditions occur:
      • The expiration date for the data has passed. The expiration date is set to 6 months after the account was originally disabled. At this point, the data has been "quarantined" for (and therefore not accessed for at least) six months, and so is beyond the age at which DIT staff reserve the right to delete anyway.
      • PoCs representing all of the allocations to which the user had been associated just before being disabled have given approval for the early deletion of the data. This is to allow freeing up of resources ahead of the normal 6 month policy, but is only done if representatives of ALL allocations agree to it.
      • If any PoC from an allocation the user had been associated with just before the account was disabled requests that the data be transferred to another user, the data will be transferred (as per HPC policy, all data belongs to the allocations to which the user is a member). This does not require consent of all PoCs involved, and it is assumed that should multiple PoCs need different parts of the data things can be worked out in a friendly manner.

    Back to Top