Files, Storage, and Securing your Data

On the cluster, you have several options available to you regarding where files are stored. This page discusses the various options, and the differences between them in terms of performance, backup, quotas, securitues, and policies.

Overview and comparison of the various storage tiers
Home directories
1. Quotas on home directories
2. Security and home directories
Node Local /tmp directory
1. Security and /tmp directories
Scratch/Short-term/High-performance File Systems
1. Project-based directories
2. Filesystem types
  1. The BeeGFS Filesystem
  2. The Lustre Filesystem
SHELL (medium Term) File System
Archival Storage
1. Google G Suite Drive
2. UMD Box Service
Policies regarding usage of Disk Space on the HPC clusters

Overview and Comparision of the Different Storage Tiers

There are various different tiers of storage on an HPC cluster, differing in the amount of available storage, their performance, policies, etc. The basic tiers available on HPC systems at UMD are summarized and compared in the table below, to help you select the most appropriate tier to use for a given need. Click on the name in the tier column for more detail.

Storage Tier	Visibility	Technology	Performance	Size	Permanence	Suggested Use Cases
Home Directory	all nodes on cluster	NFS, backed by physical disk	standard	10 GB/user quota	lifetime of cluster backed up	1) small precious data 2) build scripts/configuration for building codes
Local /tmp directory	only the local compute node	Zaratan standard nodes: SATA SSD Zaratan serial nodes: NVMe SSD	high performance	Zaratan standard nodes: >~ 1 TB/node Zaratan serial nodes: >~ 10 TB/node	Temporary Deleted after job ends	1) temporary disk workspace for job 2) staging job inputs
High Performance/Scratch filesystem	all nodes on cluster	Zaratan: BeeGFS	high performance	Zaratan: 2 PB total, quota-ed by group	short term Data for active jobs only not backed up	1) input data for jobs 2) output from jobs 3) checkpoint files 4) files should be deleted/moved once job completes
SHELL medium term filesystem	Zaratan login nodes any system with Auristor/AFS client not from compute or DTN nodes	Auristor (similar to AFS)	standard	Zaratan: 7.9 PB total, volume based quotas by group	medium term not backed up	1) storage of input data not needed for current jobs, but which might be needed in 6 months or so. 2) storage of results from jobs that need to be retained for a year or two
Archival storage	External to the Zaratan clusters. Access from login or DTNs only		slow	No archival storage is provided as part of Zaratan cluster. We do discuss options outside of the cluster.	long term	1) long term storage of data as required by grants or publications 2) long term storage of precious results

While the Zaratan cluster maintains reasonable security, it is not certified or approved for the storage of sensitive information. No classified data, CUI , or HIPAA data are allowed on the cluster, nor any data at other than Low (Level 1) classification in the UMD Data Classification Table.

Your home directory

Your home directory is the directory the your are placed in when you first log into the cluster. This filesystem is not optimized for high performance, so it should not be used for input or output files for jobs. It is, however, the only filesystem on the cluster which is backed up, so it is suitable for your most precious data, but because it is backed up, it is costlier than the other tiers and therefore is strictly limited in size.

Your home directory is by default private to you, and should be used as little as possible for data storage. In particular, you should NOT run jobs out of your home directory --- run your jobs from the scratch filesystem; this is optimized to provide better read and write performance to improve the speed of your job. After the job is finished, you might wish to copy the more critical results files back to your home directory or to a SHELL directory. Your home directory gets backed up nightly. (The scratch and SHELL filesystems are not backed up.)

Do not run jobs out of your home directory, or run jobs doing extensive I/O from your home directory, as it is NOT optimized for that.

Your home directory is the ONLY directory that gets backed up by the Division of IT. You should copy your precious, irreplaceable files (custom codes, summarized results, etc) here.

Topics related to home directories:

Quotas on home directories
Security and home directories

Quotas on home directories

Home directories on the Zaratan cluster are limited by a 10 GB "soft quota" policy. Realizing the need for storage can sometimes vary dramatically over the span of a few days, we have adopted a policy with some flexibility in this regard. You can temporarily (up to a week) store up to double the 10 GB quota (i.e. up to 20 GB) in your home directory. But you must bring your usage under the 10 GB quota within seven days or you will not be able to store any more data in your home directory.

Note that unlike the group based quotas on the scratch and SHELL storage tiers, the quota on your home directory is personal. The amount of storage you have available in your home directory is solely influenced by the amount of storage you are using, and not by the usage of other members of your projects.

You can check your home space quota and usage with the command home_quota. Without any arguments, this command will show your home space usage, quota, percent used, files, file limit, and percent of file limit, along with the time of the end of your grace period if you are in a grace period, all of this in a format intended to be easy for someone to read. There are a number of options to control the output, and you can use the -h to see an explanation of these.

If you are comparing the usage, etc., returned by the home_quota command with the usage reported by the du, note that the home quota command by default uses SI units: e.g. 1 GB = 10^9 bytes, and similar for other prefixes, whereas the du command by default uses binary units (1 GiB = 1024^3 bytes) -- see the glossary entries for GiB and GB for more information. It is recommended that if you use the du command you use it with the additional flags --si --apparent-size to get more directly comparable results.

Security and home directories

As the HPC clusters are intended for research, all content in your home directory, like all other data on the cluster, is considered to be research related, and is at some level considered property of the University and the faculty PIs of the allocations through which you receive access to the cluster. However, the default permissions on your home directory is such that only you (and systems staff) have access to the contents.

Certain directories, like ~/.ssh may contain security related data (e.g. private keys for ssh public key authentication which needs to be kept private and readable only by you.

You can use the standard Unix chmod command to alter the permissions on files and directories under your home directory to allow others access; however we recommend you consider using scratch or SHELL storage for such instead --- there even are shared folders in the scratch and SHELL spaces for the projects to which you belong to facilitate the sharing of data among members of the same project. If you do open up your home directory, please remember to ensure more sensitive data (like the contents of your .ssh folder remain readable only be you.

Temporary Node-local Storage

All of the compute nodes on the UMD HCP clusters have temporary local storage mounted at /tmp which is readable and writable to all users of the clusters. Being local, files stored in tmp on a given node are only acessible by other processes running on that same node; i.e. they are not accessible from other nodes, including the login nodes. But since it is local, it is not subject to network latency and bandwidth limitations and so tends to be faster than many networked file systems, although the high performance scratch tier can sometimes outperform it. Performance on the /tmp filesystems also tends to be more consistent than scratch performance, since the scratch filesystem is shared by many jobs and users over many nodes and such might impact performance at times, whereas the /tmp filesystem can only be impacted by the much smaller number of other processes running on the same node, and of which many usually are part of your job.

This storage is temporary, any files you place in this directory will be deleted once you no longer have any jobs running on the node. Typical use cases include:

as storage for temporary files generated during the running of the job that are not needed once the job completes. In this case, no special treatment is needed, as the data is not needed after the job completes (and the system will take care of deleting it).
for input files that will be read frequently during the course of a job. In this case, you will normally have the original input files stored on another tier (e.g. scratch), copy the files to a per-job directory under /tmp at the start of the job, and do the heavy reading from the local copy on /tmp. Since it is just a copy, it is OK when it is deleted after the job completes.
It can be used for output files that should be retained after the job completes, adding some lines to copy the data to the scratch filesystem just before the job ends, but this can be dangerous in many situations and therefore is not generally recommended. If the job terminates abnormally for some reason, the copy will not occur, and any partial results (which might be useful in debugging why the job aborted) will be lost.

All of the above use cases assume that it is acceptable that the data is only accessible from the local node.

All data in tmp belonging to you will be deleted once your jobs on the cmopute node finish.

If you use the /tmp directory, it is advisable to make your own directory underneath /tmp, and set restrictive permissions so other users on the node cannot access the data. You might even wish to make a job specific directory, to reduce the chance of your jobs interfering with each other. You can do this with a code snippet like (in bash)

TMPDIR="/tmp/${USER}-${SLURM_JOB_ID}"
export TMPDIR
mkdir $TMPDIR
chmod 700 $TMPDIR

or (in tcsh)

setenv TMPDIR "/tmp/${USER}-${SLURM_JOB_ID}"
mkdir $TMPDIR
chmod 700 $TMPDIR

Put the appropriate snippet near the top of your code; after any #SBATCH lines but before the script starts doing any serious work.

The nodes in the standard partition on Zaratan have at least 1 TB of storage available in /tmp. This is based on solid state disks , but using a traditional disk interface (SAS) instead of the faster NVMe interface. The nodes in the serial partition on Zaratan have at least 10 TB of solid state disk space mounted on /tmp, and this is NVMe based so is quite fast.

The Intel nodes on Juggernaut have at least 700 GB of temporary storage per node provided by spinning hard drives. The AMD nodes on Juggernaut have at least 300 GB of solid-state disk based storage on /tmp.

See for information on how to specify the amount of temporary space needed by your job.

High-performance/Scratch Storage Tier

A networked file system on a HPC cluster must be able to support heavy I/O from a large number of processes running on a large number of nodes. This requires a high performance filesystem to keep up with potential load. Generally this is done by spreading the data over a large number of file servers; with the appropriate configuration, even large single files get spread over multiple servers. This increases the ratio between the number of files servers and the amount of storage, greatly increasing the cost per terabyte but also greatly increases the file system performance as it allows the different tasks of a large parallel job to access different parts of the same file without overwhelming a single file server.

Because of the high relative cost, high performance file systems should only be used for storing files related to active jobs (i.e. jobs that are currently running, recently finished,ain the pipeline or for ongoing research for which you are regularly submitting jobs). the pipeline). It is not meant for archival storage of any type. For this reason, the high performance file systems are often referred to as scratch file systems. Typically, input data should be downloaded to the scratch file system (or copied from the medium term/SHELL file system before the job is submitted. After the job completes, then the inputs can be deleted (or returned to the SHELL storage tier), unneeded temporary files should be deleted, and precious output moved to longer term storage (e.g. SHELL or your home directory).

The scratch filesystems are for the temporary storage of files supporting active research on the cluster only. They are NOT for archival storage. Files more than 90 days old on the scratch filesystems are subject to deletion without notice by systems staff. Please note that the scratch filesystem is NOT backed up. If you have critical data that must be saved, be sure to copy it elsewhere. You are responsible for making backup copies of any valuable data.

Because much of the data generated on the cluster is of a transient nature and because of its size, data stored in the scratch and SHELL filesystems is not backed up. This data resides on RAID protected filesystems, however there is always a small chance of loss or corruption. If you have critical data that must be saved, be sure to copy it elsewhere.

Project-based Directory Structure

NOTE: As the HPC clusters are intended for research, all content under a project's scratch directory tree, like all other data on the cluster, is considered to be research related, and is at some level considered

With Zaratan, we have switched to a project based file organizational structure for scratch. If you are a member of a project named foo (e.g. you have acess to a Slurm allocation starting foo-, e.g. foo-aac) on Zaratan, you will have access to a directory tree starting at /scratch/zt1/project/foo. This directory is, by default, only accessible to members of the associated project. The managers of the project will have write permission to this directory. Underneath it, there will be two directories by default: shared and user.

By default, all members of the project will have read-write access to the shared directory for the project. This is intended to facilitate the coolaboration between members of the research team. If there are static data files to be shared among the team, but which should be read only, you can place them here but it is recommended you remove the group write permission on such data to prevent other users from accidentally overwriting the data.

Every user in the project receives a "personal" directory under the user subdirectory. By default, this directory and all that is underneath it is only readable by the user it is named after, however the contents are group-owned by the Unix group for the research project. The user can opt to grant access to specific subdirectories of their "personal" directory to the entire project or to select subsets.

To facilitate access to your "personal" directory, the system will by default create a symlink in your home directory named scratch.foo (where foo is the name of the project under the /scratch/zt1/project directory). This link is a "pointer" to your "personal" scratch directory for the specified project, similar to a short cut on Windows. You can cd to it, or use it in paths, and it will be resolved to your personal directory in scratch space. E.g., if you have a file /scratch/zt1/project/foo/user/YOURUSERNAME/somedir/somefile.txt, the command cat ~/scratch.foo/somedir/somefile.txt will output the contents of the file. Note that there is only a single copy of the file: if you do rm ~/scratch.foo/somedir/somefile.txt, that file will be deleted, and will no longer be accessible under either path.

Since most people only belong to a single project, we also create a shortened symlink scratch in your home directory, which is the same as scratch.foo if you only belong to a single project. If you belong to multiple projects (e.g. foo and bar), then scratch will still be defined, and it will either point to scratch.foo or scratch.bar, depending on the order in which you were added to the two projects. You can see which be issuing the command ls -l ~/scratch (do not give a trailing '/' to scratch). To change what it points to, you can issue an ln -sf command; e.g. if it points to foo but you want it to point to bar, you can give the command ln -sf ~/scratch.bar ~/scratch. Again, even though there are multiple paths that you can use to reach the data, only one copy exists.

Juggernaut still uses an user based directory structure for its scratch filesystem, but that will likely change soon.

Filesystem types for scratch

There are two main technologies used at UMD for this high performance scratch filesystems, and the cluster which you are using determines which technology is employed:

The BeeGFS filesystem is used on Zaratan.
The Lustre filesystem is used on Juggernaut ( and was used on the Deepthought clusters)

The Zaratan cluster provides 2 PB of BeeGFS based high performance scratch storage, and Juggernaut provides 1.5 PB of Lustre based scratch storage.

For the most part, you can use either filesystem without really paying attention to the underlying technology. However, to avail yourself of the more advanced features, like:

viewing your disk usage and/or quota
configuring striping and other techniques for optimizing performance

you need to know the filesystem specific commands to do such. These are discussed in more detail below, by filesystem:

Using BeeGFS
Using Lustre

Using BeeGFS scratch space

The BeeGFS filesystems are NOT BACKED UP. Any valuable data should be copied elsewhere (home directory or off cluster) to prevent loss of critical data due to hardware issues. You are responsible for backing up any valuable data.

This section discusses the usage of the BeeGFS scratch filesystemn found on some UMD HPC clusters. The Zaratan HPC cluster has a 2 PB BeeGFS based scratch filesystem.

Scratch quotas on BeeGFS
Permissions and ACLS for BeeGFS
Striping on BeeGFS

Scratch Quotas (BeeFGS)

The scratch filesystems have quota limits in place to prevent excessive use. However, to ensure there is adequate space for everyone using the cluster, this space is still only to be used for temporarily storing files in active use for jobs and projects currently running on the system. I.e., when your job is finished, remove all the data files, etc. that are no longer needed. See the section on archival storage or shell storage for a discussion of some of the storage options available to you if you need to retain the data for longer periods of time.

To ensure adequate scratch space remains available for all users, the scratch filesystems are subject to an automatic purge policy. Files older than 90 days are subject to automatic removal without warning. Data that needs to be kept on the cluster for longer periods of time should be kept in the medium-term SHELL storage. Users are responsible for moving their own data between the various filesystem types.

Every allocation in a project has an associated amount of scratch storage included in the allocation (although such may be 0). The scratch allotments from each of the allocations underneath a project are summed together to obtain a combined total scratch quota for the project. All members of any allocation underneath the project receive equal access to the entire combined scratch quota.

Note that by default, we only apply quotas at the project level, so e.g. if a project lists a 1 TB scratch quota, that means that the combined scratch usage of all members of that project must not exceed 1 TB. If your colleagues are already consuming that 1 TB, then there is nothing left for you. Such matters are best worked out at the research group level; preferably by the team members involved, with the PI and/or project managers stepping in if needed. If necessary, we might be able to apply per user quotas for the problematic users, but we prefer not to do so if possible. By default, there are no per user quotas on scratch space.

You can check the scratch filesystem quota and usage for the projects to which you belong with the scratch_quota command. Without any arguments, this command will display the disk space quota and usage for all projects you belong to, and your personal usage. Most people only belong to a single project, in which case the output would look like

login.zaratan.umd.edu> scratch_quota 
# Group quotas
          Group name     Space used    Space quota   % quota used
            zt-test      532.462 GB       1.100 TB         48.41%
# User quotas
           User name     Space used    Space quota   % quota used  % of GrpTotal
             payerle     316.717 GB      unlimited              0         59.48%

In the example above, the user belongs to a single project, with a corresponding Unix group zt-test, which has a scratch quota of 1.1 TB. The members of the project have a combined usage of 532 GB, which is 48% of the quota. In the User quotas section, we see the username of the user who ran the command (payerle), and that he is using 317 GB, which ic 59% of the total usage of the project (the 532 GB above). The unlimited under the Space quota column for that user means that that there is not a user-specific quota being applied; this user is still subject to the group quota (1.1 TB in this case). The % quota used column displays the usage as a percentage of the quota; for the user line, since there is no user level quota, this value is zero. If an user level quota was imposed, it would show how much of that quota has been used.

If you belong to multiple projects, the scratch_quota command without arguments will list the usage for each such project in the Group quotas section, along with a line totaling them all. The % of GrpTotal column in the users section will be in reference to that total.

The scratch_quota command has a fair number of optional flags that can be given to control its behavior; these can be enumerated in full with the --help flag. Some flags you might find useful:

--users: As the primary quotas for the scratch filesystem are project based, it is sometimes useful to see what the usage of other members of the project. If this option is given, the command will list all members (of all projects) in the users section. Note: that the usages given for an user represents the total usage of that user for all projects the user belongs to --- there is not simple way to separate out an user's usage for a given project. Because of this, it is possible to get results which could be confusing at first glance. E.g., if your project has a 1 TB quota, and there is an user lisa belongs to your group and to another group with a 10 TB quota, if lisa is using 0.1 TB with your group but 2 TB with the other group, the command will show her usage as 2.1 TB, which is more than double the quota for your group. This needs to be kept in mind when interpretting the output. Most users, however, only belong to a single project.
--show-files: While we have thus far only discussed the quota in terms of the amount of disk space, there also is a quota in terms of the number of files allowed. The scratch filesystem is optimized to handle large files; very large numbers of small files can be problematic to performance. To catch such problems before they stall the system, we limit the number of files per TB of disk storage. We are still working out what the thresholds at which things become problematic, and we will automatically increase the file count limits for groups as they approach our more conservative limits, but we may contact you if there seems to be a problematic trend. Anyway, this flag will display the file count limits and usage.

If you are comparing the usage, etc., returned by the scratch_quota command with the usage reported by the du, please note that there are multiple ways to compute disk usage due to complications like internal fragmentation, indirect blocks, sparse files, etc. The scratch_quota command takes the usage numbers straight from the underlying quota system. The du and other Unix commands use somewhat different algorthms which sometimes results in significantly different values. Also, the quota system and scratch_quota command use SU units, e.g. 1 GB = 10^9 bytes, and similar for other prefixes, whereas the du command (and most Unix commands) by default use binary units (1 GiB = 1024^3 bytes) -- see the glossary entries for GiB and GB for more information. It is recommended that if you use the du command you use it with the additional flags --si --apparent-size to get more directly comparable results.

BeeGFS Permissions and Access Control Lists (ACLS)

As the HPC clusters are intended for research, all content in your dpersonal scratch irectory, like all other data on the cluster, is considered to be research related, and is at some level considered property of the University and the faculty PIs of the allocations through which you receive access to the cluster.

Every project has an Unix group associated with it, and every member of any allocation under the project is a member of that Unix group. The scratch space for a project is group owned by this Unix group. The root of the project directory and the user directory are readable (but not writable) by any member of the group (the primary PI of the project does have write permission on the root directory).

The share directory, by default, gives read-write access to all members of the project's Unix group; as the name implies this directory is intended to allow for easy sharing of files between members of the project. This is a good place to store datasets needed for jobs for multiple users in the group, etc.

Under the user directory contains personal subdirectories for every member of an allocation underneath the project, named after the user. These directories are group owned by the Unix group for the project, although by default the permissions are set so that only the owning user can read and/or write to files underneath the personal subdirectory. NOTE: while we describe these subdirectories as "personal", please note that all content on the HPC clusters, including the "personal" scratch directories, are considered research data and are under the ownership of the PI(s) of the project. I.e., systems staff will grant any requests by the PI of the project for access to this data.

If you wish to allow access to files in your "personal" scratch directory to all other members of your group, this can be done by simply changing the Unix permissions on the files and directories, To grant read access to a file, you can use the chmod g+r command on the file. Note that you also need to grant the g+rx permission to all parent directories of the file being shared; please note that this will allow others in the group to list all files, as well as read any files with read permission set. For example, if you have the username testuser and belonga to the test project and you wish to share a file testfile under the testdir directory of your personal scratch space with the rest of your group, you could do something like:

chmod g+r /scratch/zt1/project/test/user/testuser/testdir/testfile
chmod g+rx /scratch/zt1/project/test/user/testuser/testdir
chmod g+rx /scratch/zt1/project/test/user/testuser

The chmod in the above is the name of the command being used (change mode). The g+r and g+rx argument instructs the command to change the group permission of the file to add the read (r) or read plus execute (rx) permission to the specified file or directory. The execute permission on directories (as opposed to regular files where it means one can execute the file) grants the recipient of the permission the ability to list files in the directory.

You can provide write access to a file and/or directory in a similar fashion, just adding a "w" permission in addition to the "r" permission for files. So to grant write permission to the file testfile, you can use the command chmod g+rw testfile; note that to grant write permission to the file testfile, you on;y need to grant rx permission to the parent directories (i.e. you do not need to grant write permission to the directories). You can grant write permission to a directory with the command chmod g+rwx testdir; note that this will give all members of your Unix group the ability to create, edit, and delete any files in that directory. We recommend that you grant write permission only sparingly.

The above scenarios are limited to granting access to the entire Unix group for your project. While this is probably the most common case, there are cases wherein you wish to grant access only to a select group of users in your Unix group (as opposed to all users in the group), and/or to select users which are not part of your project. In these cases, you can use POSIX Access Control Lists (ACLs) to grant permissions. Using the same testfile as above, if you wished to grant read access to that file to an user someuser, you could use a command like:

> setfacl -m u:someuser:r /scratch/zt1/project/test/user/testuser/testdir/testfile
> setfacl -m u:someuser:rx /scratch/zt1/project/test/user/testuser/testdir
> setfacl -m u:someuser:rx /scratch/zt1/project/test/user/testuser
>
> getfacl /scratch/zt1/project/test/user/testuser/testdir/testfile
getfacl: Removing leading '/' from absolute path names
# file: scratch/zt1/project/test/user/testuser/testdir/testfile
# owner: testuser
# group: zt-test
 user::rwx
 user:someuser:r--
 group::---
 mask::r--
 other::---

The -m flag instructs setfacl that you wish to modify the ACL, and the u:someuser:r or u:someuser:rx argument means grant the user (u) someuser the read (r) or read+execute (rx) permission. The final getfacl lists the ACLs on the file, In this case it shows that the normal Unix permissions are all permissions for the user owning the file (user::rwx), no permissions for the group owner (group::---) or others (other::---) and the ACL granting user someuser the read permission (user:someuser:r).

Note that in the above example, we were assuming the someuser was a member of the the test project, and therefore already had read access to the /scratch/zt1/project/test/user directory. Remember that in order to be effectively be granted read or write access to a file or directory, the user needs to have read access to every parent directory in the chain. While all Zaratan users have read access to the /scratch/zt1/project directory, the project specific directories by default only grant read access to members of the project. So if someuser is not a member of the test allocation, ACLs will need to be set on some project level directories as well, which will require approval from the PI or managers of the project (as such could potentially adversely impact the security of the directories of others in the project).

BeeGFS Striping

striping

/scratch/zt1

When a file is creating, it inherits the striping information from the parent directory. You can view the striping parameters for an existing directory with the command:

bash-4.4$  beegfs-ctl --getentryinfo /scratch/zt1/project/test/user/payerle/testdir
Entry type: directory
EntryID: 38-634F1DC8-2
Metadata node: mds-2 [ID: 2]
Stripe pattern details:
+ Type: RAID0
+ Chunksize: 512K
+ Number of storage targets: desired: 4
+ Storage Pool: 1 (Default)

In this example, the directory /scratch/zt1/project/test/user/payerle/testdir is striped across 4 disk arrays (the line Number of storage targets) with a chunk size of 512 kB. You can also use the same command to examine the striping pattern of a file, e.g.

bash-4.4$ dd if=/dev/zero bs=1k of=/scratch/zt1/project/test/user/payerle/testdir/testfile count=1024
1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0666556 s, 15.7 MB/s
bash-4.4$ ls -l /scratch/zt1/project/test/user/payerle/testdir/testfile
-rw-r--r--. 1 payerle zt-test 1048576 Oct 18 17:46 /scratch/zt1/project/test/user/payerle/testdir/testfile
bash-4.4$  beegfs-ctl --getentryinfo /scratch/zt1/project/test/user/payerle/testdir/testfile 
Entry type: file
EntryID: 6B6-634F1E85-2
Metadata node: mds-2 [ID: 2]
Stripe pattern details:
+ Type: RAID0
+ Chunksize: 512K
+ Number of storage targets: desired: 4; actual: 4
+ Storage targets:
  + 504 @ oss-5 [ID: 501]
  + 105 @ oss-1 [ID: 101]
  + 204 @ oss-2 [ID: 201]
  + 405 @ oss-4 [ID: 401]
bash-4.4$

Currently users are not allowed to change the striping pattern, even for their own files. This is because doing so improperly can significantly degrade performance of the scratch file system as a whole. The default settings should be good for most use cases, however if you believe you have a situation which would benefit from a non-default striping pattern, please contact systems staff and explain the situation.

Using lustre

Lustre is a high performance distributed file system designed for HPC clusters. Files are distributed among multiple servers, even in some cases different parts of the same file are on different servers. By spreading the load across multiple file servers, this allows for the faster responses to file requests required to deal with the heavy load some parallel codes demand.

The lustre filesystems are NOT BACKED UP. Any valuable data should be copied elsewhere (home directory or off cluster) to prevent loss of critical data due to hardware issues. You are responsible for backing up any valuable data.

Every user is provided a personal lustre directory when their account on the cluster is created. The location of this directory varies a bit from cluster to cluster. For an user with username username, their personal lustre directory is located at:

On the Juggernaut cluster: at /lustre/jn10/username.

Your lustre directory is visible from the login nodes, data transfer nodes, AND from all of the compute nodes.

For the most part, you can use lustre as you would any other filesystem; the standard unix commands work, and you should just notice better performance in IO heavy codes.

Normally, lustre will keep the data for an individual file on the same fileserver, but will distribute your files across the available servers. The lfs getstripe and lfs setstripe commands can be used to control the striping. More information can be found in the section on Lustre and striping.

Lustre stores the "metadata" about a file (its name, path, etc) separately from the data. Normally, the IO intensive applications contact the metadata server (MDS) once when opening the file, and then contact the object storage servers (OSSes) as they do the heavy IO. This generally improves performance for these IO heavy applications.

Certain common interactive tasks, e.g. ls -l require data from both the MDS and the OSSes, and take a bit longer on lustre. Again, these are not the operations lustre is optimized for, as they are not commonly done frequently in IO heavy codes.

The lfs find command is a version of find optimized for lustre. It tries to avoid system calls that require information from the OSSes in addition to the MDS, and so generally will run much faster than the unoptimized find command. Usage is by design similar to the standard find command.

If you want to see how much space you are currently using in any of the Lustre filesystems, run the command lustre_usage. This will show you total usage for yourself and for any groups you belong to. Note that this will only show you Lustre usage, and will not include any files outside of Lustre.

login-1:~: lustre_usage
Usage for /export/lustre_1:
======================================================================

Username     Space Used   Num Files   Avg Filesize
------------------------------------------------------------
rubble             2.3T     4134684    607.7K

Group        Space Used   Num Files   Avg Filesize
------------------------------------------------------------
flint              4.6T     6181607    795.4K

Lustre and striping

As mentioned previously, lustre gets its speed by "striping" files over multiple Object Storage Targets (OSTs); basically multiple fileserver nodes each of which holds a part of the file. This is mostly transparent to the user, so you would not normally know if/that your file is split over multiple OSTs.

By default on the Deepthought clusters, every file is kept on a single OST, and this striping just means that different files are more or less randomly spread across different file servers/OSTs. This is fine for files of moderate size, but might need adjustment if dealing with files of size 10 or 100 GB or more. The lfs getstripe and lfs setstripe commands exist for this.

The getstripe subcommand is the simplest, and just gives information about the striping of a file or directory. Usage is just lfs getstripe FILEPATH and it prints out information about the named file's striping. E.g.:

login-1> lfs getstripe test.tar
test.tar
lmm_stripe_count:   1
lmm_stripe_size:    1048576
lmm_pattern:        1
lmm_layout_gen:     0
lmm_stripe_offset:  9
        obdidx           objid           objid           group
             9         2549120       0x26e580                0
login-1>

The above example shows a file created using default settings. The file in this case is on a single OST (the number of stripes for the file, given by lmm_stripe_count, is 1). The lmm_stripe_offset gives the index to the starting OST, in this case 9, and below that show alls the stripes (in this case, just the single one). One case use the command lfs osts to correlate the index to the name of an actual OST. The lmm_stripe_size value is the size of the stripe, in bytes, in this case 1048576 bytes or 1 MiB.

While examining a file's striping parameters is nice, it is not particularly useful unless one can also change it, which can be done with the lfs setstripe subcommand. Actually, the striping for a file is NOT MUTABLE, and is set in stone at the time of file creation. So one needs to use the setstripe subcommand before the file is created. E.g., to create our test.tar file again, this time striped over 20 OSTs and using a stripe size of 10 MiB, we could do something like:

login-1> rm test.tar
login-1> lfs setstripe -c 20 -S 10m test.tar
login-1> ls -l test.tar
-rw-r--r-- 1 payerle glue-staff 0 Sep 18 17:02 test.tar
login-1> tar -cf test.tar ./test
login-1> ls -l test.tar
-rw-r--r-- 1 payerle glue-staff 8147281920 Sep 18 17:04 test.tar
login-1> lfs getstripe test.tar
test.tar
lmm_stripe_count:   20
lmm_stripe_size:    10485760
lmm_pattern:        1
lmm_layout_gen:     0
lmm_stripe_offset:  55
        obdidx           objid           objid           group
            55       419995932     0x1908a11c                0
            63       468577296     0x1bedec10                0
            45       419403761     0x18ff97f1                0
            68       435440970     0x19f44d4a                0
            57       409176967     0x18638b87                0
            44       377767950     0x1684480e                0
            61       419414421     0x18ffc195                0
            65       356701609     0x1542d5a9                0
            31       408705898     0x185c5b6a                0
            12       429746020     0x199d6764                0
            50       379985276     0x16a61d7c                0
            16       372211487     0x162f7f1f                0
            46       468289628     0x1be9885c                0
            10       402610097     0x17ff57b1                0
            30       425031271     0x19557667                0
            60       423186185     0x19394f09                0
            69       496205056     0x1d937d00                0
            35       409685517     0x186b4e0d                0
            70       415859549     0x18c9835d                0
            15       449399811     0x1ac94c03                0

We start by deleting the previously created test.tar; this is necessary because one cannot use lfs setstripe on an existing file. We then use the -c option to setstripe to set the stripe count, and the -S option to set the stripe size, in this case 10 MiB. One can also use the suffices 'k' for kiB, or 'g' for GiB. The setstripe creates an empty file with the desired striping parameters. We then issue the tar command to put content in the file, and then run the getstripe subcommand to confirm the file has the correct striping.

As mentioned before, one cannot use the setstripe subcommand on an existing file. So what if we want to change the striping of an existing file? E.g., what if we decide now we want test.tar to have 5 stripes of size 1 GiB? Because we cannot directly change the striping of an existing file, we need to use setstripe to create a new file with the desired striping, and copy the old file to the new file (you can then delete the old file and rename the new file to the old name if desired). E.g.

login-1> lfs getstripe test.tar
test.tar
lmm_stripe_count:   20
lmm_stripe_size:    10485760
lmm_pattern:        1
lmm_layout_gen:     0
lmm_stripe_offset:  55
        obdidx           objid           objid           group
            55       419995932     0x1908a11c                0
            63       468577296     0x1bedec10                0
	...
login-1>  ls -l test2.tar
ls: cannot access test2.tar: No such file or directory
login-1> lfs setstripe -c 5 -S 1g test2.tar
login-1> ls -l test2.tar
-rw-r--r-- 1 payerle glue-staff 0 Sep 18 17:16 test2.tar
login-1> cp test.tar test2.tar
login-1> ls -l test2.tar
-rw-r--r-- 1 payerle glue-staff 8147281920 Sep 18 17:17 test2.tar
login-1> diff test.tar test2.tar; echo >/dev/null Make sure they are the same
login-1> lfs getstripe test2.tar; echo >/dev/null Verify striping
test2.tar
lmm_stripe_count:   5
lmm_stripe_size:    1073741824
lmm_pattern:        1
lmm_layout_gen:     0
lmm_stripe_offset:  61
        obdidx           objid           objid           group
            61       419416513     0x18ffc9c1                0
            31       408708503     0x185c6597                0
            66       422684037     0x1931a585                0
            49       429032715     0x1992850b                0
            16       372213361     0x162f8671                0
login-1> rm test.tar; mv test2.tar test.tar

This only touches the surface of what can be done with striping in lustre, for additional information look at:

SHELL (Medium term) storage

The SHELL storage tier on Zaratan os not automatically backed up. If you have critical data that must be saved, be sure to copy it elsewhere.

The SHELL storage tier is not accessible from the compute nodes. You can access it from the login nodes and/or from remote systems with AFS clients configured to access it. This is because the SHELL storage tier is not optimized for the demands of high performance computing, and the need for AFS tokens makes it difficult to effectively use in batch jobs.

In addition to your home and scratch spaces, you also will have (for each project you are a member of) space on the SHELL file system . This is a medium term storage tier, intended for the storage of data which will be needed for work done on the HPC cluster in the future, but is not needed for active jobs or jobs in the pipeline (i.e., the "future" in this case is longer than appropriate for storage on the scratch file system. This storage tier is much larger than the scratch storage system, and therefore it can hold data for longer periods of time. It is not backed up, nor are there any guarrantees that the storage will last beyond the lifetime of the Zaratan cluster (5 years or wo), and so therefore it not suitable for archival purposes. It is intended for useful for the storage of large amounts of data related to research on the cluster, even if the data is not related to jobs which are running currently or in the near future.

Unlike the scratch file system, the SHELL file system is not designed for high performance --- this makes it significantly cheaper which is why we can afford more of it. For this reason, we do not make it accessible from the compute nodes, as a relatively small number of jobs doing heavy I/O could potentially overwhelm the file system. The recommended usage is along these lines:

Store large data sets needed for computations on your SHELL storage.
When such data sets will be needed as input for jobs that are to be run in the near future, copy the required data sets to scratch storage.
Run the jobs needing the data sets.
When the jobs finish, move any output files that need to be retained to SHELL storage, and delete the input files that you copied from SHELL. This ensures there is enough space on the scratch file system for the input data sets from your next set of jobs.

Because the SHELL storage is protected by Kerberos, you must have valid kerberos tickets/AFS tokens to access your files. If you are getting permission denied or similar errors, please see the section on the SHELL filesystem and AFS tokens.

The SHELL project directory structure

The SHELL storage tier uses a project project based directory structure similar to that used by the scratch tier on Zaratan. Each project with an allocation of SHELL storage will have a directory on the SHELL file system at /afs/shell.umd.edu/project/foo where foo is a short name for the project.

As the HPC clusters are intended for research, all content under this project SHELL directory, like all other data on the cluster, is considered to be research related, and is at some level considered property of the University and the faculty PIs for the corresponding project.

By default, this root directory is readable by all members of the project, and not accessible by anyone else. Project managers are able to write to this directory, but other members of the project only have read access to the top level SHELL directory for the project. There are two subdirectories, share and user underneath that root directory by default.

Normally all users of the project have read-write access to the share directory. This directory is intended to facilitate collaboration among members of the research team. If you place data here which everyone should be able to read but which should not be overwritten/modified/etc., then you should remove the group write permissions on the data to prevent other users from accidentally overwriting the data.

In addition, every user in the project receives a "personal" directory under the user subdirectory. By default, this directory and all that is underneath it is only readable by the user it is named after, however you can easily grant access to specific subdirectories of your "personal" directory to the entire project or to select subsets.

For your convenience, the system creates a symbolic link SHELL.foo (where foo is the name of the project, as it appears under /afs/shell.umd.edu/project) in your home directory pointing toward your personal directory under the foo project's SHELL space. This is a pointer, similar to a short-cut on Windows; files placed under that symbolic link are actually on the SHELL storage tier --- there is only one copy of each file despite it having multiple paths, so if you modify/delete the file under one path, the change is visible in all paths. If you belong to multiple projects, there should be similar symbolic links for each project. The system also creates a symlink SHELL under your home directory, which points to the same location as one of the SHELL.foo --- since most users only belong to a single project, the two symlinks will point to the same location.

If you belong to multiple projects, the SHELL symbolic link will point to your personal directory in the SHELL space for one of the projects you belong to, but the choice of which depends on the order in which you were added to the projects. You can use the command ls -l ~/SHELL to see which project's SHELL space it is using, and if you wish to change the symlink to point to the SHELL space of a different project, e.g. foo, you can use the command ln -sf ~/SHELL.foo ~/SHELL

The SHELL filesystem, volumes, and quotas

The underlying Auristor filesystem is a networked file system which is made available to machines outside the cluster. This filesystem is "volume" based; data is stored in units called volumes. These volumes look like normal directories, and for most purposes behave like directories, however not all directories are volumes. A major difference between directories and volumes is that volumes have an associated size limit; for various reasons the filesystem prefers a large number of smaller volumes rather than a smaller number of large volumes. Volumes may be nested in the filesystem, e.g. for a project bee-research with an user named johndoe, typically both /afs/shell.umd.edu/project/bee-research and /afs/shell.umd.edu/project/bee-research/user/johndoe will each be volumes. Files under /afs/shell.umd.edu/project/bee-research/user/johndoe, directly underneath or in some chain of subdirectories beneath that directory, will count towards the size limit of the johndoe volume (as long as none of the "directories" in the subdirectory chain is itself a new volume), but not towards the size limit of the bee-research volume.

For each project, the system will automatically create a volume at the root of the project's SHELL space, and one volume under the user subdirectory for each user belonging to the project. By default, all of these volumes have a 1 TB limit on the amount of data that can be stored in them. These caps of the size of the volume can be changed at the request of the PI or a designated manager of the project (just have them send email to hpcc-help@umd.edu). Just tell us which volume (i.e. the file to the "directory") and the new size. These same people can also request the creation of new volumes in the same fashion; just tell us where the volume should be mounted (i.e. the path to the volume "directory") and the desired size. As stated previously, the underlying filesystem prefers a large number of smaller volumes rather than fewer larger volumes. We are reluctant to create volumes larger than 20 TB, and prefer for them to be 10 TB or even smaller.

You can view the current quotas and usage of the SHELL space for projects you belong to with the shell_quota command. Note: you need to have valid AFS tokens in order to use this command. By default, the command shows a summary of the quota and usage of the SHELL storage tier for all projects you belong to, e.g.:

login-1:~$ shell_quota
SHELL storage rooted at /afs/shell.umd.edu/project/test:
Total Project Quota: 2.50 TB
Total Project Usage: 1.46 TB (58.6% of Quota)

SHELL storage rooted at /afs/shell.umd.edu/project/test2:
Total Project Quota: 1.00 TB
Total Project Usage: 11.26 kB (0.0% of Quota)

In the above example, the user running the command belongs to two projects, named test and test2. The test project has a SHELL quota of 2.5 TB, with 1.5 TB ( 58.6% of the 2.5 TB quota) used. This 1.5 TB is the sum of the usages for all subvolumes of the root volume of test project SHELL storage tree. The test2 project has a 1 TB quota of which about 11 kB is used.

While this summary information is often all you wish to see, sometimes you will wish to see more detail, and in particular wish to see which subvolumes are contributing the most to the overall usage. You can use the same shell_quota command with the --show_volumes flag to see the summary plus information on usage and caps of each subvolume for the project. In the example below, we also add the --project test flag to restrict output to the test project, and the --verbose flag to display some additional information:

login-1:~$ shell_quota --show_volumes --project test --verbose
SHELL storage rooted at /afs/shell.umd.edu/project/test: (volumes p.test*)
Total Project Quota: 2.50 TB
Total Project Usage: 1.46 TB (58.6% of Quota)
Total Allocated    : 8.21 TB (328.5% of Quota)
Data from: 2023-03-08 09:56:28

Subvolumes: 9
Mountpoint                                                    MaxSize    % of      Disk     % of     % of
rel to root                                                    of vol  Prj Qta     Used Max Used  Prj Usg
------------------------------------------------------------ -------- -------- -------- -------- --------
                                                        1.02 TB    41.0% 15.36 kB     0.0%     0.0%
user/larry                                                    1.02 TB    41.0%  2.05 kB     0.0%     0.0%
user/moe                                                      1.02 TB    41.0% 938.21 GB    91.6%    64.1%
user/curly                                                    1.02 TB    41.0% 180.69 GB    17.6%    12.3%
user/groucho                                                  1.02 TB    41.0%  2.05 kB     0.0%     0.0%
user/harpo                                                    1.02 TB    41.0% 345.67 GB    33.8%    23.6%
user/harpo/rwtest                                            21.47 GB     0.9%  3.07 kB     0.0%     0.0%
user/chico                                                    1.02 TB    41.0% 797.70 kB     0.0%     0.0%
user/zeppo                                                    1.02 TB    41.0%  2.05 kB     0.0%     0.0%
login-1:~$

In this case, the output is restricted to the project test; note that you can only display information about projects you belong to. The summary at top is largely the same as before, with two additional lines due to the verbose flag. The first new line is the Total Allocated line, which lists the sum of the size limits for all volumes in the project, and compares it to the quota. In this case, the sum of the maximum sizes of all the subvolumes is 8.2 TB (8 volumes with 1.02 TB and one with 21 GB), which is 328% of the 2.50 TB. This is an example of oversubscription, which we discuss more below.

The other new line in the summary section due to the --verbose flag is the Data from line. The data used to produce the output of this command is cached and updated several times a day, and this line informs you when the data was last updated. If you recently added or deleted files, the impact will not be seen immediately.

Following the summary section is a list of all subvolumes belonging to the project and their usages. The first column lists the mountpoint of the volume, relative to the root mountpoint (/afs/shell.umd.edu/project/test in this example). The special case <ROOT> represents the root volume itself --- whereas the summary section sums up the total maximum volume sizes and usages for all volumes in the project, this entry lists the maximum size and usage just for the root volume. The next column MaxSize of vol lists the per volume limit of size for the volume. The % of Prj Qta column shows that per volume size limit as a percentage of the total quota for the project; e.g. in this case the 1.02 TB volume limits are 41 % of the projects 2.5 TB quota. The Disk Used column list the usage for this specific volume. The % of Max Used shows that usage as a percent of maximum size of the volume, and % of Prj Used shows that usage as a percent of the total usage for the project.

There are additional options you can provide to the shell_quota command; the aforementioned are just the ones we believe will be most useful to most people. Use the command shell_quota --help to see the full list of options.

Oversubscription is permissible; i.e. it is permissible if the sum of the sizes of the volume caps for the various volumes within a project's SHELL space to exceed the SHELL storage allocation "quota" for the project. As long as the sum of the actual disk usage for the various volumes of a project does not exceed the project's SHELL allocation "quota". Such oversubscription even occurs automatically --- base allocations from the AAC generally only have 1 TB of SHELL storage allotted to them, and the system creates a root volume and volumes for each user with 1 TB volume size caps.

The operating system will only prevent users from significantly exceeding the per volume limits; there is no mechanism to warn or prevent users from exceeding the project's SHELL quota before it is exceeded. E.g., in the example above, both users moe and groucho have volumes with 1 TB per volume limits, and almost no usage. They can both add up to 1 TB of data to their respective volumes, and the operating system will not prevent it because it is within the 1 TB per volume limit. However, assuming there was not a drastic reduction in usage in other volumes, that would be the total usage for the test project up to around 3.5 TB, significantly over the 2.5 TB quota.

If at some point the usage over all of the volumes for a project exceeds the project's SHELL allocation "quota", we will send email to the PI and managers of the project informing them of the situation and asking that the issue be rectified in a reasonable amount of time, typically one week. Rectifying the issue can include a combination of:

Deleting data which is no longer needed.
Moving data to off-cluster storage (and deleting the copies on the cluster once successfully moved.)
Increasing your project's SHELL allocation "quota". This step might involve:
- Requesting additional SHELL storage from the AAC
- Requesting additional SHELL storage from your college or departmental resource pool on Zaratan. Depending on the college/department, there might be costs involved in this (College/Departmental policies are decided upon by the College or Department --- the Division of IT is not involved in such).
- Purchasing addition storage from DIT

If you need help figuring out how to rectify the situation, or if there are extenuating circumstances which might warrant an extension of the time limit to resolve the matter, please let us know. While our responsibilities towards other users on the cluster will not allow us to ignore such overages, we are willing to work with you to find a mutually acceptable solution.

The SHELL filesystem: AFS tokens

The SHELL storage tier uses the Auristor File System, which is basically an enhanced version of the AFS filesystem that has been in use by the campus Glue system for many years. This is a global filesystem with clients available for all modern operating systems, which makes the content securely accessible (secured by Kerberos credentials) wherever it is needed.

Because the SHELL storage is protected by Kerberos, you must have valid kerberos tickets/AFS tokens to access your files. These tokens will normally be create dfor you automatically when you login using your password, or if you do a passwordless login using Kerberos tickets obtained on another system. They will not be created if you login in using RSA or other public key authentication --- this authentication method is simply unable to obtain Kerberos tickets for you. Please also note that Kerberos tickets and AFS tokens have expiration times; typically 8-24 hours after they are obtained. If your AFS tokens expire, you will no longer be able to access SHELL storage until you renew them.

If you do not have AFS tokens or they expired, you can renew Kerberos tickets and AFS tokens by issuing the command renew on the cluster. This generally will require you to enter you UMD LDAP directory password.

If you have set up RSA or similar public key authentication for ssh to avoid entering a password everytime you access the cluster, this becomes problematic. While you can still use such to access much of the cluster, you will need to use the kinit and aklog commands to get AFS tokens to access the SHELL storage, which defeats the goal of password-less login. Unfortunately, the nature of public key authentication simply makes it incapable of obtaining Kerberos tickets and AFS tokens for you.

A better approach is to install a Kerberos client on your workstation, and run the kinit command once on your workstation before opening sessions to the cluster. This will require you to use your password, but it is a once-a-day or so type of operation. You can then configure ssh to allow GSSAPI authentication when access the cluster login nodes, which will allow you to access the cluster without entering your password again. The forwarded Kerberos tickets can then be used to obtain AFS tokens for you, allowing you to access SHELL storage.

Accessing your SHELL space remotely

The Auristor filesystem used as the underlying filesystem of the SHELL storage tier is a potentially globally distributed file system which can securely provide access to files and data based on Kerberos credentials. What this means is that by installing the appropriate client software on your local workstation or laptop, you can access your SHELL space as if it were a local filesystem.

The process to do this depends on the OS running on your local workstation, and we split into different sections for different OSes:

Accessing your SHELL space from a Windows system
Accessing your SHELL space from a Mac system
Accessing your SHELL space from a Linux system

Accessing your SHELL space from a Windows system

Note:

contact systems staff

>Download the latest version of the OpenAFS Client Installer from the Auristor website. It should normally display the installers compatible with your system. It should normally display the installers most compatible with your system. Find the best match, (you probably want the 64 bit installer. click on the yellow button with a label starting with "yfs-openafs".
Download the installer, and then run it. If it does not start automatically, you open the File Explorer application and go to your Downloads folder, and double click on the file (which should start with "yfs-openafs").
The installer will start with a license agreement. You should read the agreement, and if you have no objections, accept the agreement in order to continue with the installation.
The next page will ask for some options. Please set
1. Default Cell should be set to shell.umd.edu (for the SHELL storage tier on Zaratan)
2. Integrated logon should be set to Disable
3. Cache size: keep the default
The next page (Custom Setup) gives you options for what to install. Just use the defaults.
The next page is to confirm that you really wish to install. Click the Install button to proceed.
The Windows OS might also pop up a confirmation window asking if you wish to install new software. If so, click the Yes button to proceed.
The package should be installing, and a window with a progress bar will be displayed. When done, you can click on the Finish button to exit the setup wizard.
You will be prompted to restart the system to complete the installation.

After the system reboots, you can open a command prompt from the Start Menu and issue the command: kinit MYUSERNAME@UMD.EDU followed by aklog, replacing MYUSERNAME with your login name on Zaratan (which should be the part of your @umd.edu or @terpmail.umd.edu email address to the left of the "at" sign (@), and will normally be all lowercase). The @UMD.EDU must be all uppercase. This will give you Kerberos tickets on your Windows workstation. This kinit step will need to be repeated every time you reboot your workstation (at least if you plan to use password-less ssh in that session), or when your Kerberos tickets expire (typically one day).

The above steps installed the OpenAFS client on your system, and you have valid Kerberos tickets. We now discuss:

Configuring passwordless ssh via kinit from Windows systems
Mount SHELL directories on your Windows system

Configuring passwordless ssh via kinit from Windows systems

Although the above kinit step will obtain Kerberos tickets for you, you still need to configure your ssh client to authenticate to the remote system using these tickets. The steps to accomplish this depends on the specific ssh client you are using.

For the putty ssh client, do the following:

Start putty, and go to the configuration menu.
In the configuration menu, select SSH, then the Auth, and then the GSSAPI pane. On this pane, make sure the two boxes Attempt GSSAPI authentication and Allow GSSAPI credential delegation.
The find the Connection and Data in the configuration menus, and in the field Auto-login username enter your username on the Zaratan cluster.

Mount SHELL directories on your Windows system

Although the above aklog step will obtain AFS tokens for you, you still need to mount the SHELL directory or directories on your workstation. To do this:

Go to Computer | Map Network Drive. This should open a Map Network Drive window.
There should be a drop down selector labeled Drive. You can use this to select which drive letter the SHELL folder should be mounted as.
Underneath that is a text box labeled Folder. This is where you provide the path to the SHELL folder you wish to mount. You should enter the full path to directory, with forward slashes (i.e. '/') being converted to backslashes ('\'), and two backslashes at the start. E.g, if you want to mount your personal SHELL directory from project foo, and your username is smith, you would enter \\afs\shell.umd.edu\project\foo\user\smith. Note that you cannot use the symlink ~SHELL or similar here.
You can leave the check boxes unchecked. The Reconnect at login box souns tempting, but since you will not have AFS tokens at logon (not until you issue the kinit and aklog commands), I do not believe it will provide the functionality desired.
Click the Finish button.

This should result in the SHELL directory specified being mounted at the drive letter specified. You might wish to mount your personal directory and the share directory for the same project to different letters, or if you belong to multiple projects, mount your personal space from each project to a different drive letter.

Accessing your SHELL space from a Mac system

This section is still under construction.

You can >Download the OpenAFS Client Installer for Mac from Auristor.

Accessing your SHELL space from a Linux system

This section is still under construction.

You can >Download the OpenAFS Client Installer for Linux from Auristor.

Archival storage

A proper archival storage tier is for long term storage of data, especially data which is infrequently accessed. The generally requires automatic backups and guarantees on the lifetime of the data. Unfortunately the HPC clusters at UMD do not currently provide an archival storage tier. The BeeGFS and lustre filesystems on the various HPC resources are intended for ongoing research on the cluster. These storage resources are limited and are not intended for long term or archival storage, as they are needed for people to run jobs on the cluster. You are required to delete data that is no longer needed, and to move data that needs to be retained elsewhere once it is not needed to support queued or running jobs.

Campus provides user the ability to store large amounts of data via on campus and cloud-based services, namely:

Isilon Networked Storage Service
Google G Suite Drive
UMD Box Service

Using the Campus Isilon Network Storage service for Archival Storage

Campus maintains a networked file storage system which can be accessed either by the NFS protocol (suitable for access by Unix-like systems) or the CIFS/SMB protocol (suitable for access by Windows-like systems).

Pricing and other information, along with links to the forms to request such service, can be found at the Networked Storage service catalog

Using Google G Suite Drive for Archival Storage

Campus provides the ability to store large amounts of data on Google's G Suite drive. Please see the Google drive service catalog entry for more information, including restrictions on what data can be stored there and how to set up your account if you have not done so already.

The recommended utility for accessing Google drive from the HPC cluster is to use the rclone command:

In addition to supporting many Cloud storage providers, it also has features to prevent exceeding Google's data rate limits.

The gdrive command is also available, but it tends to exceed Google's rate limits when moving large amounts of data.

Using Box for Archival Storage

Campus also provides the ability to store large amounts on the Box cloud-based storage platform. Please see the UMD Box service catalog entry for more information, including restrictions on what data can be stored there and how to set up your account if you have not done so already.

The UMD Box service can be accessed from the cluster in several ways. We recommend using the rclone command:

Alternatively, one could use an ftps client. NOTE: although similar in name and function, ftps is not the same as sftp. They are different protocols, and Box does NOT support sftp at this time. Probably the best command line ftps utility is the lftp command; see:

Securing Your Data

Your home directory as configured is private and only you have access to it. Any directories you create outside your home directory are your responsibility to secure appropriately. If you are unsure of how to do so, please submit a help ticket requesting assistance.

If you're a member of a group, you'll want to make sure that you give your group access to these directories, and you may want to consider setting your umask so that any files you create automatically have group read and write access. To do so, add the line umask 002 to your .cshrc.mine file.

If your jobs process sensitive data, it is strongly recommended that you submit all such jobs in exclusive mode to prevent other jobs/users from running on the same node(s) as your job.