Skip to main content

Files, Storage, and Securing your Data

On the cluster, you have several options available to you regarding where files are stored. This page discusses the various options, and the differences between them in terms of performance, backup, quotas, securitues, and policies.

  1. Overview and comparison of the various storage tiers
  2. Home directories
    1. Quotas on home directories
    2. Security and home directories
  3. Node Local /tmp directory
    1. Security and /tmp directories
  4. Scratch/Short-term/High-performance File Systems
    1. Project-based directories
    2. Filesystem types
      1. The BeeGFS Filesystem
        1. Scratch quotas on BeeGFS
        2. Permissions and ACLS for BeeGFS
        3. Striping on BeeGFS
      2. The Lustre Filesystem
  5. SHELL (medium Term) File System
    1. Project-based directories
    2. Volumes and Quotas on SHELL
    3. SHELL storage and AFS tokens
    4. Access SHELL storage from outside the cluster
      1. From Windows systems:
        1. Passwordless ssh from Windows to cluster using kinit
        2. Mounting SHELL directories on a Windows system
      2. From Apple/Mac systems
      3. From Linux systems
  6. Archival Storage
    1. Google G Suite Drive
    2. UMD Box Service
  7. Policies regarding usage of Disk Space on the HPC clusters

Overview and Comparision of the Different Storage Tiers

There are various different tiers of storage on an HPC cluster, differing in the amount of available storage, their performance, policies, etc. The basic tiers available on HPC systems at UMD are summarized and compared in the table below, to help you select the most appropriate tier to use for a given need. Click on the name in the tier column for more detail.

Storage Tier Visibility Technology Performance Size Permanence Suggested Use Cases
Home Directory all nodes on cluster NFS, backed by physical disk standard 10 GB/user lifetime of cluster
backed up
1) small precious data
2) build scripts/configuration for building codes
Local /tmp directory only the local compute node Zaratan standard nodes: SATA SSD
Zaratan serial nodes: NVMe SSD
high performance Zaratan standard nodes: >~ 1 TB/node
Zaratan serial nodes: >~ 10 TB/node
Temporary
Deleted after job ends
1) temporary disk workspace for job
2) staging job inputs
High Performance/Scratch filesystem all nodes on cluster Zaratan: BeeGFS
high performance Zaratan: 2 PB total, quota-ed by group
short term
Data for active jobs only
not backed up
1) input data for jobs
2) output from jobs
3) checkpoint files
4) files should be deleted/moved once job completes
SHELL medium term filesystem Zaratan login nodes
any system with Auristor/AFS client
not from compute or DTN nodes
Auristor (similar to AFS) standard Zaratan: 7.9 PB total, volume based quotas by group
medium term
not backed up
1) storage of input data not needed for current jobs, but which might be needed in 6 months or so.
2) storage of results from jobs that need to be retained for a year or two
Archival storage External to the Zaratan clusters.
Access from login or DTNs only
slow No archival storage is provided as part of Zaratan cluster. We do discuss options outside of the cluster. long term 1) long term storage of data as required by grants or publications
2) long term storage of precious results
WARNING
While the Zaratan cluster maintains reasonable security, it is not certified or approved for the storage of sensitive information. No classified data, CUI , or HIPAA data are allowed on the cluster, nor any data at other than Low (Level 1) classification in the UMD Data Classification Table.

Your home directory

Your home directory is the directory the your are placed in when you first log into the cluster. This filesystem is not optimized for high performance, so it should not be used for input or output files for jobs. It is, however, the only filesystem on the cluster which is backed up, so it is suitable for your most precious data, but because it is backed up, it is costlier than the other tiers and therefore is strictly limited in size.

Your home directory is by default private to you, and should be used as little as possible for data storage. In particular, you should NOT run jobs out of your home directory --- run your jobs from the scratch filesystem; this is optimized to provide better read and write performance to improve the speed of your job. After the job is finished, you might wish to copy the more critical results files back to your home directory or to a SHELL directory. Your home directory gets backed up nightly. (The scratch and SHELL filesystems are not backed up.)

WARNING
Do not run jobs out of your home directory, or run jobs doing extensive I/O from your home directory, as it is NOT optimized for that.
WARNING
Your home directory is the ONLY directory that gets backed up by the Division of IT. You should copy your precious, irreplaceable files (custom codes, summarized results, etc) here.
WARNING
While the Zaratan cluster maintains reasonable security, it is not certified or approved for the storage of sensitive information. No classified data, CUI , or HIPAA data are allowed on the cluster, nor any data at other than Low (Level 1) classification in the UMD Data Classification Table.

Topics related to home directories:

Quotas on home directories

Home directories on the Zaratan cluster are limited by a 10 GB "soft quota" policy. Realizing the need for storage can sometimes vary dramatically over the span of a few days, we have adopted a policy with some flexibility in this regard. You can temporarily (up to a week) store up to double the 10 GB quota (i.e. up to 20 GB) in your home directory. But you must bring your usage under the 10 GB quota within seven days or you will not be able to store any more data in your home directory.

Note that unlike the group based quotas on the scratch and SHELL storage tiers, the quota on your home directory is personal. The amount of storage you have available in your home directory is solely influenced by the amount of storage you are using, and not by the usage of other members of your projects.

You can check your home space quota and usage with the command home_quota. Without any arguments, this command will show your home space usage, quota, percent used, files, file limit, and percent of file limit, along with the time of the end of your grace period if you are in a grace period, all of this in a format intended to be easy for someone to read. There are a number of options to control the output, and you can use the -h to see an explanation of these.

If you are comparing the usage, etc., returned by the home_quota command with the usage reported by the du, note that the home quota command by default uses SI units: e.g. 1 GB = 10^9 bytes, and similar for other prefixes, whereas the du command by default uses binary units (1 GiB = 1024^3 bytes) -- see the glossary entries for GiB and GB for more information. It is recommended that if you use the du command you use it with the additional flags --si --apparent-size to get more directly comparable results.

Security and home directories

As the HPC clusters are intended for research, all content in your home directory, like all other data on the cluster, is considered to be research related, and is at some level considered property of the University and the faculty PIs of the allocations through which you receive access to the cluster. However, the default permissions on your home directory is such that only you (and systems staff) have access to the contents.

Certain directories, like ~/.ssh may contain security related data (e.g. private keys for ssh public key authentication which needs to be kept private and readable only by you.

You can use the standard Unix chmod command to alter the permissions on files and directories under your home directory to allow others access; however we recommend you consider using scratch or SHELL storage for such instead --- there even are shared folders in the scratch and SHELL spaces for the projects to which you belong to facilitate the sharing of data among members of the same project. If you do open up your home directory, please remember to ensure more sensitive data (like the contents of your .ssh folder remain readable only be you.

Temporary Node-local Storage

All of the compute nodes on the UMD HCP clusters have temporary local storage mounted at /tmp which is readable and writable to all users of the clusters. Being local, files stored in tmp on a given node are only acessible by other processes running on that same node; i.e. they are not accessible from other nodes, including the login nodes. But since it is local, it is not subject to network latency and bandwidth limitations and so tends to be faster than many networked file systems, although the high performance scratch tier can sometimes outperform it. Performance on the /tmp filesystems also tends to be more consistent than scratch performance, since the scratch filesystem is shared by many jobs and users over many nodes and such might impact performance at times, whereas the /tmp filesystem can only be impacted by the much smaller number of other processes running on the same node, and of which many usually are part of your job.

This storage is temporary, any files you place in this directory will be deleted once you no longer have any jobs running on the node. Typical use cases include:

All of the above use cases assume that it is acceptable that the data is only accessible from the local node.
WARNING
All data in tmp belonging to you will be deleted once your jobs on the cmopute node finish.

If you use the /tmp directory, it is advisable to make your own directory underneath /tmp, and set restrictive permissions so other users on the node cannot access the data. You might even wish to make a job specific directory, to reduce the chance of your jobs interfering with each other. You can do this with a code snippet like (in bash)

TMPDIR="/tmp/${USER}-${SLURM_JOB_ID}"
export TMPDIR
mkdir $TMPDIR
chmod 700 $TMPDIR
or (in tcsh)
setenv TMPDIR "/tmp/${USER}-${SLURM_JOB_ID}"
mkdir $TMPDIR
chmod 700 $TMPDIR
Put the appropriate snippet near the top of your code; after any #SBATCH lines but before the script starts doing any serious work.

The nodes in the standard partition on Zaratan have at least 1 TB of storage available in /tmp. This is based on solid state disks , but using a traditional disk interface (SAS) instead of the faster NVMe interface. The nodes in the serial partition on Zaratan have at least 10 TB of solid state disk space mounted on /tmp, and this is NVMe based so is quite fast.

The Intel nodes on Juggernaut have at least 700 GB of temporary storage per node provided by spinning hard drives. The AMD nodes on Juggernaut have at least 300 GB of solid-state disk based storage on /tmp.

See for information on how to specify the amount of temporary space needed by your job.

High-performance/Scratch Storage Tier

WARNING
While the Zaratan cluster maintains reasonable security, it is not certified or approved for the storage of sensitive information. No classified data, CUI , or HIPAA data are allowed on the cluster, nor any data at other than Low (Level 1) classification in the UMD Data Classification Table.

A networked file system on a HPC cluster must be able to support heavy I/O from a large number of processes running on a large number of nodes. This requires a high performance filesystem to keep up with potential load. Generally this is done by spreading the data over a large number of file servers; with the appropriate configuration, even large single files get spread over multiple servers. This increases the ratio between the number of files servers and the amount of storage, greatly increasing the cost per terabyte but also greatly increases the file system performance as it allows the different tasks of a large parallel job to access different parts of the same file without overwhelming a single file server.

Because of the high relative cost, high performance file systems should only be used for storing files related to active jobs (i.e. jobs that are currently running, recently finished,ain the pipeline or for ongoing research for which you are regularly submitting jobs). the pipeline). It is not meant for archival storage of any type. For this reason, the high performance file systems are often referred to as scratch file systems. Typically, input data should be downloaded to the scratch file system (or copied from the medium term/SHELL file system before the job is submitted. After the job completes, then the inputs can be deleted (or returned to the SHELL storage tier), unneeded temporary files should be deleted, and precious output moved to longer term storage (e.g. SHELL or your home directory).

WARNING
The scratch filesystems are for the temporary storage of files supporting active research on the cluster only. They are NOT for archival storage. Files more than 90 days old on the scratch filesystems are subject to deletion without notice by systems staff. Please note that the scratch filesystem is NOT backed up. If you have critical data that must be saved, be sure to copy it elsewhere. You are responsible for making backup copies of any valuable data.
WARNING
Because much of the data generated on the cluster is of a transient nature and because of its size, data stored in the scratch and SHELL filesystems is not backed up. This data resides on RAID protected filesystems, however there is always a small chance of loss or corruption. If you have critical data that must be saved, be sure to copy it elsewhere.

Project-based Directory Structure

NOTE: As the HPC clusters are intended for research, all content under a project's scratch directory tree, like all other data on the cluster, is considered to be research related, and is at some level considered

With Zaratan, we have switched to a project based file organizational structure for scratch. If you are a member of a project named foo (e.g. you have acess to a Slurm allocation starting foo-, e.g. foo-aac) on Zaratan, you will have access to a directory tree starting at /scratch/zt1/project/foo. This directory is, by default, only accessible to members of the associated project. The managers of the project will have write permission to this directory. Underneath it, there will be two directories by default: shared and user.

By default, all members of the project will have read-write access to the shared directory for the project. This is intended to facilitate the coolaboration between members of the research team. If there are static data files to be shared among the team, but which should be read only, you can place them here but it is recommended you remove the group write permission on such data to prevent other users from accidentally overwriting the data.

Every user in the project receives a "personal" directory under the user subdirectory. By default, this directory and all that is underneath it is only readable by the user it is named after, however the contents are group-owned by the Unix group for the research project. The user can opt to grant access to specific subdirectories of their "personal" directory to the entire project or to select subsets.

To facilitate access to your "personal" directory, the system will by default create a symlink in your home directory named scratch.foo (where foo is the name of the project under the /scratch/zt1/project directory). This link is a "pointer" to your "personal" scratch directory for the specified project, similar to a short cut on Windows. You can cd to it, or use it in paths, and it will be resolved to your personal directory in scratch space. E.g., if you have a file /scratch/zt1/project/foo/user/YOURUSERNAME/somedir/somefile.txt, the command cat ~/scratch.foo/somedir/somefile.txt will output the contents of the file. Note that there is only a single copy of the file: if you do rm ~/scratch.foo/somedir/somefile.txt, that file will be deleted, and will no longer be accessible under either path.

Since most people only belong to a single project, we also create a shortened symlink scratch in your home directory, which is the same as scratch.foo if you only belong to a single project. If you belong to multiple projects (e.g. foo and bar), then scratch will still be defined, and it will either point to scratch.foo or scratch.bar, depending on the order in which you were added to the two projects. You can see which be issuing the command ls -l ~/scratch (do not give a trailing '/' to scratch). To change what it points to, you can issue an ln -sf command; e.g. if it points to foo but you want it to point to bar, you can give the command ln -sf ~/scratch.bar ~/scratch. Again, even though there are multiple paths that you can use to reach the data, only one copy exists.

Juggernaut still uses an user based directory structure for its scratch filesystem, but that will likely change soon.

Filesystem types for scratch

There are two main technologies used at UMD for this high performance scratch filesystems, and the cluster which you are using determines which technology is employed:

The Zaratan cluster provides 2 PB of BeeGFS based high performance scratch storage, and Juggernaut provides 1.5 PB of Lustre based scratch storage.

For the most part, you can use either filesystem without really paying attention to the underlying technology. However, to avail yourself of the more advanced features, like:

you need to know the filesystem specific commands to do such. These are discussed in more detail below, by filesystem:
  1. Using BeeGFS
  2. Using Lustre

Using BeeGFS scratch space

WARNING
The BeeGFS filesystems are NOT BACKED UP. Any valuable data should be copied elsewhere (home directory or off cluster) to prevent loss of critical data due to hardware issues. You are responsible for backing up any valuable data.

This section discusses the usage of the BeeGFS scratch filesystemn found on some UMD HPC clusters. The Zaratan HPC cluster has a 2 PB BeeGFS based scratch filesystem.

  1. Scratch quotas on BeeGFS
  2. Permissions and ACLS for BeeGFS
  3. Striping on BeeGFS
Scratch Quotas (BeeFGS)

The scratch filesystems have quota limits in place to prevent excessive use. However, to ensure there is adequate space for everyone using the cluster, this space is still only to be used for temporarily storing files in active use for jobs and projects currently running on the system. I.e., when your job is finished, remove all the data files, etc. that are no longer needed. See the section on archival storage or shell storage for a discussion of some of the storage options available to you if you need to retain the data for longer periods of time.

To ensure adequate scratch space remains available for all users, the scratch filesystems are subject to an automatic purge policy. Files older than 90 days are subject to automatic removal without warning. Data that needs to be kept on the cluster for longer periods of time should be kept in the medium-term SHELL storage. Users are responsible for moving their own data between the various filesystem types.

Every allocation in a project has an associated amount of scratch storage included in the allocation (although such may be 0). The scratch allotments from each of the allocations underneath a project are summed together to obtain a combined total scratch quota for the project. All members of any allocation underneath the project receive equal access to the entire combined scratch quota.

Note that by default, we only apply quotas at the project level, so e.g. if a project lists a 1 TB scratch quota, that means that the combined scratch usage of all members of that project must not exceed 1 TB. If your colleagues are already consuming that 1 TB, then there is nothing left for you. Such matters are best worked out at the research group level; preferably by the team members involved, with the PI and/or project managers stepping in if needed. If necessary, we might be able to apply per user quotas for the problematic users, but we prefer not to do so if possible. By default, there are no per user quotas on scratch space.

You can check the scratch filesystem quota and usage for the projects to which you belong with the scratch_quota command. Without any arguments, this command will display the disk space quota and usage for all projects you belong to, and your personal usage. Most people only belong to a single project, in which case the output would look like

login.zaratan.umd.edu> scratch_quota 
# Group quotas
          Group name     Space used    Space quota   % quota used
            zt-test      532.462 GB       1.100 TB         48.41%
# User quotas
           User name     Space used    Space quota   % quota used  % of GrpTotal
             payerle     316.717 GB      unlimited              0         59.48%

In the example above, the user belongs to a single project, with a corresponding Unix group zt-test, which has a scratch quota of 1.1 TB. The members of the project have a combined usage of 532 GB, which is 48% of the quota. In the User quotas section, we see the username of the user who ran the command (payerle), and that he is using 317 GB, which ic 59% of the total usage of the project (the 532 GB above). The unlimited under the Space quota column for that user means that that there is not a user-specific quota being applied; this user is still subject to the group quota (1.1 TB in this case). The % quota used column displays the usage as a percentage of the quota; for the user line, since there is no user level quota, this value is zero. If an user level quota was imposed, it would show how much of that quota has been used.

If you belong to multiple projects, the scratch_quota command without arguments will list the usage for each such project in the Group quotas section, along with a line totaling them all. The % of GrpTotal column in the users section will be in reference to that total.

The scratch_quota command has a fair number of optional flags that can be given to control its behavior; these can be enumerated in full with the --help flag. Some flags you might find useful: