On the cluster, you have several options available to you regarding where files are stored. This page discusses the various options, and the differences between them in terms of performance, backup, quotas, securitues, and policies.
There are various different tiers of storage on an HPC cluster, differing in the amount of available storage, their performance, policies, etc. The basic tiers available on HPC systems at UMD are summarized and compared in the table below, to help you select the most appropriate tier to use for a given need. Click on the name in the tier column for more detail.
Storage Tier | Visibility | Technology | Performance | Size | Permanence | Suggested Use Cases |
---|---|---|---|---|---|---|
Home Directory | all nodes on cluster | NFS, backed by physical disk | standard | 10 GB/user | lifetime of cluster backed up |
1) small precious data 2) build scripts/configuration for building codes |
Local /tmp directory | only the local compute node | Zaratan standard nodes: SATA SSD Zaratan serial nodes: NVMe SSD | high performance |
Zaratan standard nodes: >~ 1 TB/node Zaratan serial nodes: >~ 10 TB/node |
Temporary Deleted after job ends |
1) temporary disk workspace for job 2) staging job inputs |
High Performance/Scratch filesystem | all nodes on cluster | Zaratan: BeeGFS | high performance | Zaratan: 2 PB total, quota-ed by group | short term Data for active jobs only not backed up |
1) input data for jobs 2) output from jobs 3) checkpoint files 4) files should be deleted/moved once job completes |
SHELL medium term filesystem | Zaratan login nodes any system with Auristor/AFS client not from compute or DTN nodes |
Auristor (similar to AFS) | standard | Zaratan: 7.9 PB total, volume based quotas by group | medium term not backed up |
1) storage of input data not needed for current jobs, but which
might be needed in 6 months or so. 2) storage of results from jobs that need to be retained for a year or two |
Archival storage | External to the Zaratan clusters. Access from login or DTNs only |
slow | No archival storage is provided as part of Zaratan cluster. We do discuss options outside of the cluster. | long term |
1) long term storage of data as required by grants or publications 2) long term storage of precious results |
|
While the Zaratan cluster maintains reasonable security, it
is not certified or approved for the storage of sensitive
information. No classified data,
CUI
, or HIPAA data are allowed on the
cluster, nor any data at other than
Low (Level 1)
classification in the
UMD Data Classification Table.
|
Your home directory is the directory the your are placed in when you first log into the cluster. This filesystem is not optimized for high performance, so it should not be used for input or output files for jobs. It is, however, the only filesystem on the cluster which is backed up, so it is suitable for your most precious data, but because it is backed up, it is costlier than the other tiers and therefore is strictly limited in size.
Your home directory is by default private to you, and should be used as little as possible for data storage. In particular, you should NOT run jobs out of your home directory --- run your jobs from the scratch filesystem; this is optimized to provide better read and write performance to improve the speed of your job. After the job is finished, you might wish to copy the more critical results files back to your home directory or to a SHELL directory. Your home directory gets backed up nightly. (The scratch and SHELL filesystems are not backed up.)
|
Do not run jobs out of your home directory, or run jobs doing extensive I/O
from your home directory, as it is NOT optimized for that.
|
|
Your home directory is the ONLY directory that gets backed
up by the Division of IT. You should copy your precious, irreplaceable files
(custom codes, summarized results, etc) here.
|
|
While the Zaratan cluster maintains reasonable security, it
is not certified or approved for the storage of sensitive
information. No classified data,
CUI
, or HIPAA data are allowed on the
cluster, nor any data at other than
Low (Level 1)
classification in the
UMD Data Classification Table.
|
Topics related to home directories:
Home directories on the Zaratan cluster are limited by a 10 GB "soft quota" policy. Realizing the need for storage can sometimes vary dramatically over the span of a few days, we have adopted a policy with some flexibility in this regard. You can temporarily (up to a week) store up to double the 10 GB quota (i.e. up to 20 GB) in your home directory. But you must bring your usage under the 10 GB quota within seven days or you will not be able to store any more data in your home directory.
Note that unlike the group based quotas on the scratch and SHELL storage tiers, the quota on your home directory is personal. The amount of storage you have available in your home directory is solely influenced by the amount of storage you are using, and not by the usage of other members of your projects.
You can check your home space quota and usage with the command
home_quota
. Without any arguments, this command will
show your home space usage, quota, percent used, files, file limit, and
percent of file limit, along with the time of the end of your grace period
if you are in a grace period, all of this in a format intended to be easy
for someone to read. There are a number of options to control the output,
and you can use the -h
to see an explanation of these.
If you are comparing the usage, etc., returned by the home_quota
command with the usage reported by the du
, note that the
home quota command by default uses SI units: e.g. 1 GB = 10^9 bytes, and
similar for other prefixes, whereas the du command by default uses
binary units (1 GiB = 1024^3 bytes) -- see the glossary entries for
GiB
and
GB
for more information.
It is recommended that if you use the du command you use it with the
additional flags --si --apparent-size
to get more directly
comparable results.
As the HPC clusters are intended for research, all content in your home directory, like all other data on the cluster, is considered to be research related, and is at some level considered property of the University and the faculty PIs of the allocations through which you receive access to the cluster. However, the default permissions on your home directory is such that only you (and systems staff) have access to the contents.
Certain directories, like ~/.ssh
may contain security
related data (e.g.
private keys for ssh public key authentication
which
needs to be kept private and readable only by you.
You can use the standard Unix chmod
command to alter
the permissions on files and directories under your home directory
to allow others access; however we recommend you consider using
scratch or SHELL storage for such instead --- there even are
shared
folders in the scratch and SHELL spaces for
the projects to which you belong to facilitate the sharing of data
among members of the same project. If you do open up your home
directory, please remember to ensure more sensitive data (like
the contents of your .ssh
folder remain readable
only be you.
All of the compute nodes on the UMD HCP clusters have temporary local
storage mounted at /tmp
which is readable and writable to
all users of the clusters. Being local, files stored in tmp
on a given
node are only acessible by other processes running on that same node; i.e. they
are not accessible from other nodes, including the login nodes. But since
it is local, it is not subject to network latency and bandwidth limitations
and so tends to be faster than many networked file systems, although the
high performance scratch tier can sometimes
outperform it. Performance on the /tmp
filesystems also
tends to be more consistent than scratch performance, since the scratch
filesystem is shared by many jobs and users over many nodes and such might
impact performance at times, whereas the /tmp
filesystem
can only be impacted by the much smaller number of other processes running
on the same node, and of which many usually are part of your job.
This storage is temporary, any files you place in this directory will be deleted once you no longer have any jobs running on the node. Typical use cases include:
/tmp
. Since it is
just a copy, it is OK when it is deleted after the job completes.
|
All data in
tmp belonging to you will be deleted
once your jobs on the cmopute node finish.
|
If you use the /tmp
directory, it is advisable to make your own
directory underneath /tmp, and set restrictive permissions so
other users on the node cannot access the data. You might even wish to make
a job specific directory, to reduce the chance of your jobs interfering with
each other. You can do this with a code snippet like (in bash)
TMPDIR="/tmp/${USER}-${SLURM_JOB_ID}"
export TMPDIR
mkdir $TMPDIR
chmod 700 $TMPDIR
setenv TMPDIR "/tmp/${USER}-${SLURM_JOB_ID}"
mkdir $TMPDIR
chmod 700 $TMPDIR
The nodes in the standard
partition on Zaratan have at least 1 TB of storage available in
/tmp
. This is based on
solid state disks
, but using a traditional disk interface (SAS)
instead of the faster NVMe interface. The nodes in the
serial partition on
Zaratan have at least 10 TB of solid state disk space mounted on
/tmp
, and this is NVMe based so is quite fast.
The Intel nodes on Juggernaut have at least 700 GB of temporary
storage per node provided by spinning hard drives. The AMD nodes
on Juggernaut have at least 300 GB of
solid-state disk
based storage on /tmp
.
See for information on how to specify the amount of temporary space needed by your job.
|
While the Zaratan cluster maintains reasonable security, it
is not certified or approved for the storage of sensitive
information. No classified data,
CUI
, or HIPAA data are allowed on the
cluster, nor any data at other than
Low (Level 1)
classification in the
UMD Data Classification Table.
|
A networked file system on a HPC cluster must be able to support heavy I/O from a large number of processes running on a large number of nodes. This requires a high performance filesystem to keep up with potential load. Generally this is done by spreading the data over a large number of file servers; with the appropriate configuration, even large single files get spread over multiple servers. This increases the ratio between the number of files servers and the amount of storage, greatly increasing the cost per terabyte but also greatly increases the file system performance as it allows the different tasks of a large parallel job to access different parts of the same file without overwhelming a single file server.
Because of the high relative cost, high performance file systems should only be used for storing files related to active jobs (i.e. jobs that are currently running, recently finished,ain the pipeline or for ongoing research for which you are regularly submitting jobs). the pipeline). It is not meant for archival storage of any type. For this reason, the high performance file systems are often referred to as scratch file systems. Typically, input data should be downloaded to the scratch file system (or copied from the medium term/SHELL file system before the job is submitted. After the job completes, then the inputs can be deleted (or returned to the SHELL storage tier), unneeded temporary files should be deleted, and precious output moved to longer term storage (e.g. SHELL or your home directory).
|
The scratch filesystems are for the temporary storage
of files supporting active research on the cluster only. They are NOT for
archival storage. Files more than 90 days old on the scratch
filesystems are subject to deletion without notice by systems staff.
Please note that the scratch filesystem is NOT backed up.
If you have critical data that
must be saved, be sure to copy it elsewhere.
You are responsible for making backup copies of any valuable data.
|
|
Because much of the data generated on the cluster is of a transient nature
and because of its size, data stored in the scratch and SHELL filesystems
is not backed up.
This data resides on RAID protected filesystems, however there is always a
small chance of loss or corruption. If you have critical data that
must be saved, be sure to copy it elsewhere.
|
NOTE: As the HPC clusters are intended for research, all content under a project's scratch directory tree, like all other data on the cluster, is considered to be research related, and is at some level considered
With Zaratan, we have switched to a
project
based file organizational
structure for scratch. If you are a member of a project named foo
(e.g. you have acess to a
Slurm allocation
starting foo-
, e.g.
foo-aac
) on Zaratan, you will have access to a directory tree
starting at /scratch/zt1/project/foo
. This directory is,
by default, only accessible to members of the associated project.
The managers of the project will have write permission to this
directory. Underneath it, there will be two
directories by default: shared
and user
.
By default, all members of the project will have read-write access
to the shared
directory for the project. This is
intended to facilitate the coolaboration between members
of the research team. If there are static data files to be shared
among the team, but which should be read only, you can place them
here but it is recommended you remove the group write permission
on such data to prevent other users from accidentally overwriting
the data.
Every user in the project receives a "personal" directory under
the user
subdirectory. By default, this directory
and all that is underneath it is only readable by the user it
is named after, however the contents are group-owned by the
Unix group for the research project. The user can opt to
To facilitate access to your "personal" directory, the system
will by default create a symlink in your home directory
named scratch.foo
(where foo
is the name of the project under the /scratch/zt1/project
directory). This link is a "pointer" to your "personal" scratch
directory for the specified project, similar to a short cut on
Windows. You can cd to it, or use it in paths, and it will
be resolved to your personal directory in scratch space.
E.g., if you have a file
/scratch/zt1/project/foo/user/YOURUSERNAME/somedir/somefile.txt
,
the command cat ~/scratch.foo/somedir/somefile.txt
will
output the contents of the file. Note that there is only a single
copy of the file: if you do rm ~/scratch.foo/somedir/somefile.txt
,
that file will be deleted, and will no longer be accessible under
either path.
Since most people only belong to a single project, we also
create a shortened symlink scratch
in your
home directory, which is the same as scratch.foo
if you only belong to a single project. If you belong to multiple
projects (e.g. foo and bar), then scratch
will still
be defined, and it will either point to scratch.foo
or scratch.bar
, depending on the order in which you
were added to the two projects. You can see which be issuing the
command ls -l ~/scratch
(do not give a trailing '/'
to scratch
). To change what it points to, you can
issue an ln -sf
command; e.g. if it points to foo
but you want it to point to bar, you can give the command
ln -sf ~/scratch.bar ~/scratch
. Again, even
though there are multiple paths that you can use to reach the
data, only one copy exists.
Juggernaut still uses an user based directory structure for its scratch filesystem, but that will likely change soon.
There are two main technologies used at UMD for this high performance scratch filesystems, and the cluster which you are using determines which technology is employed:
For the most part, you can use either filesystem without really paying attention to the underlying technology. However, to avail yourself of the more advanced features, like:
|
The BeeGFS filesystems are NOT BACKED UP.
Any valuable data should be copied elsewhere (home directory
or off cluster) to prevent loss of critical data due to hardware issues.
You are responsible for backing up any valuable data.
|
This section discusses the usage of the BeeGFS scratch filesystemn found on some UMD HPC clusters. The Zaratan HPC cluster has a 2 PB BeeGFS based scratch filesystem.
The scratch filesystems have quota limits in place to prevent excessive use. However, to ensure there is adequate space for everyone using the cluster, this space is still only to be used for temporarily storing files in active use for jobs and projects currently running on the system. I.e., when your job is finished, remove all the data files, etc. that are no longer needed. See the section on archival storage or shell storage for a discussion of some of the storage options available to you if you need to retain the data for longer periods of time.
To ensure adequate scratch space remains available for all users, the scratch filesystems are subject to an automatic purge policy. Files older than 90 days are subject to automatic removal without warning. Data that needs to be kept on the cluster for longer periods of time should be kept in the medium-term SHELL storage. Users are responsible for moving their own data between the various filesystem types.
Every allocation in a project has an associated amount of scratch storage included in the allocation (although such may be 0). The scratch allotments from each of the allocations underneath a project are summed together to obtain a combined total scratch quota for the project. All members of any allocation underneath the project receive equal access to the entire combined scratch quota.
Note that by default, we only apply quotas at the project level, so e.g. if a project lists a 1 TB scratch quota, that means that the combined scratch usage of all members of that project must not exceed 1 TB. If your colleagues are already consuming that 1 TB, then there is nothing left for you. Such matters are best worked out at the research group level; preferably by the team members involved, with the PI and/or project managers stepping in if needed. If necessary, we might be able to apply per user quotas for the problematic users, but we prefer not to do so if possible. By default, there are no per user quotas on scratch space.
You can check the scratch filesystem quota and usage for the projects
to which you belong with the scratch_quota
command. Without
any arguments, this command will display the disk space quota and usage
for all projects you belong to, and your personal usage. Most people
only belong to a single project, in which case the output would look
like
login.zaratan.umd.edu> scratch_quota
# Group quotas
Group name Space used Space quota % quota used
zt-test 532.462 GB 1.100 TB 48.41%
# User quotas
User name Space used Space quota % quota used % of GrpTotal
payerle 316.717 GB unlimited 0 59.48%
In the example above, the user belongs to a single project, with
a corresponding Unix group zt-test, which has a scratch quota of
1.1 TB. The members of the project have a combined usage of 532 GB,
which is 48% of the quota. In the User quotas section, we see the
username of the user who ran the command (payerle
), and
that he is using 317 GB, which ic 59% of the total usage of the
project (the 532 GB above). The unlimited
under the
Space quota
column for that user means that that there
is not a user-specific quota being applied; this user is still subject
to the group quota (1.1 TB in this case). The % quota used
column displays the usage as a percentage of the quota; for the
user line, since there is no user level quota, this value is zero.
If an user level quota was imposed, it would show how much of that
quota has been used.
If you belong to multiple projects, the scratch_quota
command without arguments will list the usage for each such project
in the Group quotas
section, along with a line totaling
them all. The % of GrpTotal
column in the users section
will be in reference to that total.
The scratch_quota
command has a fair number of optional
flags that can be given to control its behavior; these can be enumerated
in full with the --help
flag. Some flags you might
find useful:
--users
: As the primary quotas for the scratch
filesystem are project based, it is sometimes useful to see what
the usage of other members of the project. If this option is given,
the command will list all members (of all projects) in the users
section. Note: that the usages given for an user
represents the total usage of that user for all projects
the user belongs to --- there is not simple way to separate out
an user's usage for a given project. Because of this, it is
possible to get results which could be confusing at first glance.
E.g., if your project has a 1 TB quota, and there is an user
lisa
belongs to your group and to another group with
a 10 TB quota, if lisa
is using 0.1 TB with your
group but 2 TB with the other group, the command will show her
usage as 2.1 TB, which is more than double the quota for your
group. This needs to be kept in mind when interpretting the
output. Most users, however, only belong to a single project.
--show-files
: While we have thus far only discussed
the quota in terms of the amount of disk space, there also is a
quota in terms of the number of files allowed. The scratch filesystem
is optimized to handle large files; very large numbers of small files
can be problematic to performance. To catch such problems before they
stall the system, we limit the number of files per TB of disk storage.
We are still working out what the thresholds at which things become
problematic, and we will automatically increase the file count limits
for groups as they approach our more conservative limits, but we may
contact you if there seems to be a problematic trend. Anyway, this
flag will display the file count limits and usage.
If you are comparing the usage, etc., returned by the scratch_quota
command with the usage reported by the du
, please note that there
are multiple ways to compute disk usage due to complications like
internal fragmentation, indirect blocks, sparse files, etc. The
scratch_quota
command takes the usage numbers straight from the
underlying quota system. The du and other Unix commands use somewhat different
algorthms which sometimes results in significantly different values. Also,
the quota system and scratch_quota command use SU units, e.g. 1 GB = 10^9
bytes, and similar for other prefixes, whereas the du command (and most Unix
commands) by default use binary units (1 GiB = 1024^3 bytes) -- see the
glossary entries for
GiB
and
GB
for more information.
It is recommended that if you use the du command you use it with the
additional flags --si --apparent-size
to get more directly
comparable results.
As the HPC clusters are intended for research, all content in your dpersonal scratch irectory, like all other data on the cluster, is considered to be research related, and is at some level considered property of the University and the faculty PIs of the allocations through which you receive access to the cluster.
Every project has an Unix group associated with it, and every member of
any allocation under the project is a member of that Unix group. The scratch
space for a project is group owned by this Unix group. The root of the
project directory and the user
directory are readable (but not
writable) by any member of the group (the primary PI of the project does have
write permission on the root directory).
The share
directory, by default, gives read-write access to all
members of the project's Unix group; as the name implies this directory is
intended to allow for easy sharing of files between members of the project.
This is a good place to store datasets needed for jobs for multiple users
in the group, etc.
Under the user
directory contains personal subdirectories for
every member of an allocation underneath the project, named after the
user. These directories are group owned by the Unix group for the project,
although by default the permissions are set so that only the owning user
can read and/or write to files underneath the personal subdirectory.
NOTE: while we describe these subdirectories as "personal",
please note that all content on the HPC clusters, including the "personal"
scratch directories, are considered research data and are under the
ownership of the PI(s) of the project. I.e., systems staff will grant any
requests by the PI of the project for access to this data.
If you wish to allow access to files in your "personal" scratch directory
to all other members of your group, this can be done by simply changing the
Unix permissions on the files and directories, To grant read access to a file,
you can use the chmod g+r
command on the file. Note that you
also need to grant the g+rx
permission to all parent directories
of the file being shared; please note that this will allow others in the
group to list all files, as well as read any files with read permission set.
For example, if you have the username testuser
and belonga
to the test
project and you wish to share a file
testfile
under the testdir
directory of your
personal scratch space with the rest of your group, you could do something
like:
chmod g+r /scratch/zt1/project/test/user/testuser/testdir/testfile
chmod g+rx /scratch/zt1/project/test/user/testuser/testdir
chmod g+rx /scratch/zt1/project/test/user/testuser
The chmod
in the above is the name of the command being
used (change mode). The g+r
and
g+rx
argument instructs the command to change the group permission
of the file to add the read (r) or read plus execute (rx) permission to the
specified file or directory. The execute permission on directories (as
opposed to regular files where it means one can execute the file) grants the
recipient of the permission the ability to list files in the directory.
You can provide write access to a file and/or directory in a similar fashion,
just adding a "w" permission in addition to the "r" permission for files.
So to grant write permission to the file testfile
, you can use
the command chmod g+rw testfile
; note that to grant write
permission to the file testfile, you on;y need to grant rx permission to the
parent directories (i.e. you do not need to grant write permission to the
directories). You can grant write permission to a directory with the
command chmod g+rwx testdir
; note that this will give all members
of your Unix group the ability to create, edit, and delete any files in that
directory. We recommend that you grant write permission only sparingly.
The above scenarios are limited to granting access to the entire Unix group
for your project. While this is probably the most common case, there are
cases wherein you wish to grant access only to a select group of users in
your Unix group (as opposed to all users in the group), and/or to select users
which are not part of your project. In these cases, you can use POSIX
Access Control Lists (ACLs) to grant permissions. Using the same
testfile
as above, if you wished to grant read access to that
file to an user someuser
, you could use a command like:
> setfacl -m u:someuser:r /scratch/zt1/project/test/user/testuser/testdir/testfile
> setfacl -m u:someuser:rx /scratch/zt1/project/test/user/testuser/testdir
> setfacl -m u:someuser:rx /scratch/zt1/project/test/user/testuser
>
> getfacl /scratch/zt1/project/test/user/testuser/testdir/testfile
getfacl: Removing leading '/' from absolute path names
# file: scratch/zt1/project/test/user/testuser/testdir/testfile
# owner: testuser
# group: zt-test
user::rwx
user:someuser:r--
group::---
mask::r--
other::---
The -m
flag instructs setfacl that you wish to modify the
ACL, and the u:someuser:r
or u:someuser:rx
argument
means grant the user (u
) someuser
the read
(r
) or read+execute (rx
) permission. The
final getfacl
lists the ACLs on the file, In this case it shows
that the normal Unix permissions are all permissions for the user owning the
file (user::rwx
), no permissions for the group owner
(group::---
) or others (other::---
) and the
ACL granting user someuser the read permission (user:someuser:r
).
Note that in the above example, we were assuming the someuser
was a member of the the test
project, and therefore already
had read access to the /scratch/zt1/project/test/user
directory. Remember that in order to be effectively be granted read or write
access to a file or directory, the user needs to have read access to every
parent directory in the chain. While all Zaratan users have read access
to the /scratch/zt1/project
directory, the project specific
directories by default only grant read access to members of the project.
So if someuser
is not a member of the test
allocation, ACLs will need to be set on some project level directories
as well, which will require approval from the PI or managers of the
project (as such could potentially adversely impact the security of
the directories of others in the project).
/scratch/zt1
will be striped across 4 such disk
arrays (and typically disk arrays on different file servers), using a chunk
size of 512 kB. So the first 512 kB of a file will be on the first
disk array, the second 512 kB chunk on the second, and so forth for the
third and fourth chunks. Then it cycles back to the first array, so
the fifth chunk will again go on the first disk array, and the sixth,
seventh, and eighth chunks on the second, third and fourth arrays, and
so on and so on, until the entire file is written.
When a file is creating, it inherits the striping information from the parent directory. You can view the striping parameters for an existing directory with the command:
bash-4.4$ beegfs-ctl --getentryinfo /scratch/zt1/project/test/user/payerle/testdir
Entry type: directory
EntryID: 38-634F1DC8-2
Metadata node: mds-2 [ID: 2]
Stripe pattern details:
+ Type: RAID0
+ Chunksize: 512K
+ Number of storage targets: desired: 4
+ Storage Pool: 1 (Default)
In this example, the directory /scratch/zt1/project/test/user/payerle/testdir
is striped across 4 disk arrays (the line Number of storage targets
)
with a chunk size of 512 kB. You can also use the same command to examine the
striping pattern of a file, e.g.
bash-4.4$ dd if=/dev/zero bs=1k of=/scratch/zt1/project/test/user/payerle/testdir/testfile count=1024
1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0666556 s, 15.7 MB/s
bash-4.4$ ls -l /scratch/zt1/project/test/user/payerle/testdir/testfile
-rw-r--r--. 1 payerle zt-test 1048576 Oct 18 17:46 /scratch/zt1/project/test/user/payerle/testdir/testfile
bash-4.4$ beegfs-ctl --getentryinfo /scratch/zt1/project/test/user/payerle/testdir/testfile
Entry type: file
EntryID: 6B6-634F1E85-2
Metadata node: mds-2 [ID: 2]
Stripe pattern details:
+ Type: RAID0
+ Chunksize: 512K
+ Number of storage targets: desired: 4; actual: 4
+ Storage targets:
+ 504 @ oss-5 [ID: 501]
+ 105 @ oss-1 [ID: 101]
+ 204 @ oss-2 [ID: 201]
+ 405 @ oss-4 [ID: 401]
bash-4.4$
Currently users are not allowed to change the striping pattern, even for their own files. This is because doing so improperly can significantly degrade performance of the scratch file system as a whole. The default settings should be good for most use cases, however if you believe you have a situation which would benefit from a non-default striping pattern, please contact systems staff and explain the situation.
Lustre is a high performance distributed file system designed for HPC clusters. Files are distributed among multiple servers, even in some cases different parts of the same file are on different servers. By spreading the load across multiple file servers, this allows for the faster responses to file requests required to deal with the heavy load some parallel codes demand.
|
The lustre filesystems are NOT BACKED UP.
Any valuable data should be copied elsewhere (home directory
or off cluster) to prevent loss of critical data due to hardware issues.
You are responsible for backing up any valuable data.
|
Every user is provided a personal lustre directory when their account on the cluster is created. The location of this directory varies a bit from cluster to cluster. For an user with username username, their personal lustre directory is located at:
/lustre/jn10/username
.
Your lustre directory is visible from the login nodes, data transfer nodes, AND from all of the compute nodes.
For the most part, you can use lustre as you would any other filesystem; the standard unix commands work, and you should just notice better performance in IO heavy codes.
Normally, lustre will keep the data for an individual file on the same
fileserver, but will distribute your files across the available servers.
The lfs getstripe
and lfs setstripe
commands
can be used to control the striping. More information can be found
in the section on Lustre and striping.
Lustre stores the "metadata" about a file (its name, path, etc) separately from the data. Normally, the IO intensive applications contact the metadata server (MDS) once when opening the file, and then contact the object storage servers (OSSes) as they do the heavy IO. This generally improves performance for these IO heavy applications.
Certain common interactive tasks, e.g. ls -l
require data
from both the MDS and the OSSes, and take a bit longer on lustre. Again,
these are not the operations lustre is optimized for, as they are not commonly
done frequently in IO heavy codes.
The lfs find
command is a version of find optimized for
lustre. It tries to avoid system calls that require information from the OSSes
in addition to the MDS, and so generally will run much faster than the
unoptimized find
command. Usage is by design similar to the
standard find
command.
If you want to see how much space you are currently using in any of the Lustre
filesystems, run the command lustre_usage
. This will show you
total usage for yourself and for any groups you belong to. Note that this
will only show you Lustre usage, and will not include any files outside
of Lustre.
login-1:~: lustre_usage
Usage for /export/lustre_1:
======================================================================
Username Space Used Num Files Avg Filesize
------------------------------------------------------------
rubble 2.3T 4134684 607.7K
Group Space Used Num Files Avg Filesize
------------------------------------------------------------
flint 4.6T 6181607 795.4K
As mentioned previously, lustre gets its speed by "striping" files over multiple Object Storage Targets (OSTs); basically multiple fileserver nodes each of which holds a part of the file. This is mostly transparent to the user, so you would not normally know if/that your file is split over multiple OSTs.
By default on the Deepthought clusters, every file is kept on a single
OST, and this striping just means that different files are more or less randomly
spread across different file servers/OSTs. This is fine for files of moderate
size, but might need adjustment if dealing with files of size 10 or 100 GB or
more. The lfs getstripe
and lfs setstripe
commands
exist for this.
The getstripe
subcommand is the simplest, and just gives
information about the striping of a file or directory. Usage is just
lfs getstripe FILEPATH
and it prints out information about the
named file's striping. E.g.:
login-1> lfs getstripe test.tar
test.tar
lmm_stripe_count: 1
lmm_stripe_size: 1048576
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 9
obdidx objid objid group
9 2549120 0x26e580 0
login-1>
The above example shows a file created using default settings. The
file in this case is on a single OST (the number of stripes for the
file, given by lmm_stripe_count, is 1). The lmm_stripe_offset gives the
index to the starting OST, in this case 9, and below that show alls the
stripes (in this case, just the single one). One case use the command
lfs osts
to correlate the index to the name of an actual OST.
The lmm_stripe_size value is the size of the stripe, in bytes, in this case
1048576 bytes or 1 MiB.
While examining a file's striping parameters is nice, it is not particularly
useful unless one can also change it, which can be done with the lfs
setstripe
subcommand. Actually, the striping for a file is
NOT MUTABLE, and is set in stone at the time of file creation. So one needs
to use the setstripe
subcommand before the file is
created. E.g., to create our test.tar
file again, this time
striped over 20 OSTs and using a stripe size of 10 MiB, we could do something
like:
login-1> rm test.tar
login-1> lfs setstripe -c 20 -S 10m test.tar
login-1> ls -l test.tar
-rw-r--r-- 1 payerle glue-staff 0 Sep 18 17:02 test.tar
login-1> tar -cf test.tar ./test
login-1> ls -l test.tar
-rw-r--r-- 1 payerle glue-staff 8147281920 Sep 18 17:04 test.tar
login-1> lfs getstripe test.tar
test.tar
lmm_stripe_count: 20
lmm_stripe_size: 10485760
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 55
obdidx objid objid group
55 419995932 0x1908a11c 0
63 468577296 0x1bedec10 0
45 419403761 0x18ff97f1 0
68 435440970 0x19f44d4a 0
57 409176967 0x18638b87 0
44 377767950 0x1684480e 0
61 419414421 0x18ffc195 0
65 356701609 0x1542d5a9 0
31 408705898 0x185c5b6a 0
12 429746020 0x199d6764 0
50 379985276 0x16a61d7c 0
16 372211487 0x162f7f1f 0
46 468289628 0x1be9885c 0
10 402610097 0x17ff57b1 0
30 425031271 0x19557667 0
60 423186185 0x19394f09 0
69 496205056 0x1d937d00 0
35 409685517 0x186b4e0d 0
70 415859549 0x18c9835d 0
15 449399811 0x1ac94c03 0
We start by deleting the previously created test.tar
; this
is necessary because one cannot use lfs setstripe
on an existing
file. We then use the -c option to setstripe
to set the stripe
count, and the -S option to set the stripe size, in this case 10 MiB. One
can also use the suffices 'k' for kiB, or 'g' for GiB. The
setstripe
creates an empty file with the desired striping
parameters. We then issue the tar command to put content in the file, and
then run the getstripe
subcommand to confirm the file has the
correct striping.
As mentioned before, one cannot use the setstripe
subcommand
on an existing file. So what if we want to change the striping of an existing
file? E.g., what if we decide now we want test.tar to have 5 stripes of
size 1 GiB? Because we cannot directly change the striping of an existing file,
we need to use setstripe
to create a new file with the desired
striping, and copy the old file to the new file (you can then delete the
old file and rename the new file to the old name if desired). E.g.
login-1> lfs getstripe test.tar
test.tar
lmm_stripe_count: 20
lmm_stripe_size: 10485760
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 55
obdidx objid objid group
55 419995932 0x1908a11c 0
63 468577296 0x1bedec10 0
...
login-1> ls -l test2.tar
ls: cannot access test2.tar: No such file or directory
login-1> lfs setstripe -c 5 -S 1g test2.tar
login-1> ls -l test2.tar
-rw-r--r-- 1 payerle glue-staff 0 Sep 18 17:16 test2.tar
login-1> cp test.tar test2.tar
login-1> ls -l test2.tar
-rw-r--r-- 1 payerle glue-staff 8147281920 Sep 18 17:17 test2.tar
login-1> diff test.tar test2.tar; echo >/dev/null Make sure they are the same
login-1> lfs getstripe test2.tar; echo >/dev/null Verify striping
test2.tar
lmm_stripe_count: 5
lmm_stripe_size: 1073741824
lmm_pattern: 1
lmm_layout_gen: 0
lmm_stripe_offset: 61
obdidx objid objid group
61 419416513 0x18ffc9c1 0
31 408708503 0x185c6597 0
66 422684037 0x1931a585 0
49 429032715 0x1992850b 0
16 372213361 0x162f8671 0
login-1> rm test.tar; mv test2.tar test.tar
This only touches the surface of what can be done with striping in lustre, for additional information look at:
|
The SHELL storage tier on Zaratan os not
automatically backed up. If you have critical data that
must be saved, be sure to copy it elsewhere.
|
|
While the Zaratan cluster maintains reasonable security, it
is not certified or approved for the storage of sensitive
information. No classified data,
CUI
, or HIPAA data are allowed on the
cluster, nor any data at other than
Low (Level 1)
classification in the
UMD Data Classification Table.
|
|
The SHELL storage tier is not accessible from the compute
nodes. You can access it from the login nodes and/or from
remote systems with AFS clients configured to access
it. This is because the SHELL storage tier is not optimized
for the demands of high performance computing, and the
need for AFS tokens makes it difficult to
effectively use in batch jobs.
|
In addition to your home and scratch spaces, you also will have (for each project you are a member of) space on the SHELL file system . This is a medium term storage tier, intended for the storage of data which will be needed for work done on the HPC cluster in the future, but is not needed for active jobs or jobs in the pipeline (i.e., the "future" in this case is longer than appropriate for storage on the scratch file system. This storage tier is much larger than the scratch storage system, and therefore it can hold data for longer periods of time. It is not backed up, nor are there any guarrantees that the storage will last beyond the lifetime of the Zaratan cluster (5 years or wo), and so therefore it not suitable for archival purposes. It is intended for useful for the storage of large amounts of data related to research on the cluster, even if the data is not related to jobs which are running currently or in the near future.
Unlike the scratch file system, the SHELL file system is not designed for high performance --- this makes it significantly cheaper which is why we can afford more of it. For this reason, we do not make it accessible from the compute nodes, as a relatively small number of jobs doing heavy I/O could potentially overwhelm the file system. The recommended usage is along these lines:
|
Because the SHELL storage is protected by Kerberos, you
must have valid kerberos tickets/AFS tokens to access your
files. If you are getting permission denied or similar errors,
please see the section on the SHELL
filesystem and AFS tokens.
|
The SHELL storage tier uses a
project
project based directory
structure similar to that used by the
scratch tier on Zaratan. Each project with an allocation
of SHELL storage will have a directory on the SHELL file system
at /afs/shell.umd.edu/project/foo
where
foo is a short name for the project.
As the HPC clusters are intended for research, all content under this project SHELL directory, like all other data on the cluster, is considered to be research related, and is at some level considered property of the University and the faculty PIs for the corresponding project.
By default, this root directory is readable by all members of
the project, and not accessible by anyone else. Project managers
are able to write to this directory, but other members of the
project only have read access to the top level SHELL directory
for the project. There are two subdirectories, share
and user
underneath that root directory by default.
Normally all users of the project have read-write access to
the share
directory. This directory is intended
to facilitate collaboration among members of the research team.
If you place data here which everyone should be able to read but
which should not be overwritten/modified/etc., then you should
remove the group write permissions on the data to prevent other
users from accidentally overwriting the data.
In addition, every user in the project receives a "personal"
directory under the user
subdirectory. By default,
this directory and all that is underneath it is only readable
by the user it is named after, however you can easily
grant access to specific subdirectories
of your "personal" directory to the entire project or to select
subsets.
For your convenience, the system creates a symbolic link
SHELL.foo
(where foo is the
name of the project, as it appears under
/afs/shell.umd.edu/project
) in your home directory
pointing toward your personal directory under the foo
project's SHELL space. This is a pointer, similar to a
short-cut on Windows; files placed under that symbolic link
are actually on the SHELL storage tier --- there is only one
copy of each file despite it having multiple paths, so if you
modify/delete the file under one path, the change is visible
in all paths. If you belong to multiple projects, there should
be similar symbolic links for each project. The system also
creates a symlink SHELL
under your home directory,
which points to the same location as one of the
SHELL.foo
--- since most users only
belong to a single project, the two symlinks will point to the
same location.
If you belong to multiple projects, the SHELL
symbolic link will point to your personal directory in the
SHELL space for one of the projects you belong to, but the
choice of which depends on the order in which you were added to
the projects. You can use the command
ls -l ~/SHELL
to see which project's SHELL space
it is using, and if you wish to change the symlink to point to
the SHELL space of a different project, e.g. foo,
you can use the command
ln -sf ~/SHELL.foo ~/SHELL
The underlying
Auristor filesystem
is
a networked file system which is made
available to machines outside the cluster.
This filesystem is "volume" based; data is stored in units called
volumes. These volumes look like normal directories, and for most
purposes behave like directories, however not all directories
are volumes. A major difference between directories and volumes is
that volumes have an associated size limit; for various reasons the
filesystem prefers a large number of smaller volumes rather than a
smaller number of large volumes. Volumes may be nested in the
filesystem, e.g. for a project bee-research
with an
user named johndoe
, typically both
/afs/shell.umd.edu/project/bee-research
and
/afs/shell.umd.edu/project/bee-research/user/johndoe
will each be volumes. Files under
/afs/shell.umd.edu/project/bee-research/user/johndoe
,
directly underneath or in some chain of subdirectories beneath
that directory, will count towards the size limit of the johndoe
volume (as long as none of the "directories" in the subdirectory
chain is itself a new volume), but not towards the size limit
of the bee-research volume.
For each project, the system will automatically create a volume at the root of the project's SHELL space, and one volume under the user subdirectory for each user belonging to the project. By default, all of these volumes have a 1 TB limit on the amount of data that can be stored in them. These caps of the size of the volume can be changed at the request of the PI or a designated manager of the project (just have them send email to hpcc-help@umd.edu). Just tell us which volume (i.e. the file to the "directory") and the new size. These same people can also request the creation of new volumes in the same fashion; just tell us where the volume should be mounted (i.e. the path to the volume "directory") and the desired size. As stated previously, the underlying filesystem prefers a large number of smaller volumes rather than fewer larger volumes. We are reluctant to create volumes larger than 20 TB, and prefer for them to be 10 TB or even smaller.
You can view the current quotas and usage of the SHELL space
for projects you belong to with the shell_quota
command. Note: you need to have valid
AFS tokens in order to use this
command. By default, the command shows a summary of the
quota and usage of the SHELL storage tier for all projects
you belong to, e.g.:
login-1:~$ shell_quota
SHELL storage rooted at /afs/shell.umd.edu/project/test:
Total Project Quota: 2.50 TB
Total Project Usage: 1.46 TB (58.6% of Quota)
SHELL storage rooted at /afs/shell.umd.edu/project/test2:
Total Project Quota: 1.00 TB
Total Project Usage: 11.26 kB (0.0% of Quota)
In the above example, the user running the command belongs
to two projects, named test
and
test2
. The test
project has a
SHELL quota of 2.5 TB, with 1.5 TB ( 58.6% of the 2.5 TB
quota) used. This 1.5 TB is the sum of the usages for all
subvolumes of the root volume of test
project
SHELL storage tree. The test2
project has a
1 TB quota of which about 11 kB is used.
While this summary information is often all you wish to see,
sometimes you will wish to see more detail, and in particular
wish to see which subvolumes are contributing the most to the
overall usage. You can use the same shell_quota
command with the --show_volumes
flag to see
the summary plus information on usage and caps of each subvolume
for the project. In the example below, we also add the
--project test
flag to restrict output to the
test
project, and the --verbose
flag
to display some additional information:
login-1:~$ shell_quota --show_volumes --project test --verbose
SHELL storage rooted at /afs/shell.umd.edu/project/test: (volumes p.test*)
Total Project Quota: 2.50 TB
Total Project Usage: 1.46 TB (58.6% of Quota)
Total Allocated : 8.21 TB (328.5% of Quota)
Data from: 2023-03-08 09:56:28
Subvolumes: 9
Mountpoint MaxSize % of Disk % of % of
rel to root of vol Prj Qta Used Max Used Prj Usg
------------------------------------------------------------ -------- -------- -------- -------- --------
1.02 TB 41.0% 15.36 kB 0.0% 0.0%
user/larry 1.02 TB 41.0% 2.05 kB 0.0% 0.0%
user/moe 1.02 TB 41.0% 938.21 GB 91.6% 64.1%
user/curly 1.02 TB 41.0% 180.69 GB 17.6% 12.3%
user/groucho 1.02 TB 41.0% 2.05 kB 0.0% 0.0%
user/harpo 1.02 TB 41.0% 345.67 GB 33.8% 23.6%
user/harpo/rwtest 21.47 GB 0.9% 3.07 kB 0.0% 0.0%
user/chico 1.02 TB 41.0% 797.70 kB 0.0% 0.0%
user/zeppo 1.02 TB 41.0% 2.05 kB 0.0% 0.0%
login-1:~$
In this case, the output is restricted to the project
test
; note that you can only
display information about projects you belong to. The
summary at top is largely the same as before, with two
additional lines due to the verbose flag. The first
new line is the Total
Allocated
line, which lists the sum of the size
limits for all volumes in the project, and compares it
to the quota. In this case, the sum of the maximum sizes
of all the subvolumes is 8.2 TB (8 volumes with 1.02 TB
and one with 21 GB), which is 328% of the 2.50 TB. This
is an example of oversubscription, which we discuss more
below.
The other new line in the summary section due to the
--verbose
flag is the Data from
line. The data used to produce the output
of this command is cached and updated several times a day,
and this line informs you when the data was last updated.
If you recently added or deleted files, the impact will not
be seen immediately.
Following the summary section is a list of all subvolumes
belonging to the project and their usages. The first
column lists the mountpoint of the volume, relative to
the root mountpoint (/afs/shell.umd.edu/project/test
in this example). The special case <ROOT>
represents the root volume itself --- whereas the summary
section sums up the total maximum volume sizes and usages
for all volumes in the project, this entry lists the maximum
size and usage just for the root volume. The next column
MaxSize of vol
lists the per volume limit of
size for the volume. The % of Prj Qta
column
shows that per volume size limit as a percentage of the
total quota for the project; e.g. in this case the 1.02 TB
volume limits are 41 % of the projects 2.5 TB quota.
The Disk Used
column list the usage for this
specific volume. The % of Max Used
shows that
usage as a percent of maximum size of the volume, and
% of Prj Used
shows that usage as a percent
of the total usage for the project.
There are additional options you can provide to the
shell_quota
command; the aforementioned are just
the ones we believe will be most useful to most people. Use
the command shell_quota --help
to see the full
list of options.
Oversubscription is permissible; i.e. it is permissible if the sum of the sizes of the volume caps for the various volumes within a project's SHELL space to exceed the SHELL storage allocation "quota" for the project. As long as the sum of the actual disk usage for the various volumes of a project does not exceed the project's SHELL allocation "quota". Such oversubscription even occurs automatically --- base allocations from the AAC generally only have 1 TB of SHELL storage allotted to them, and the system creates a root volume and volumes for each user with 1 TB volume size caps.
The operating system will only prevent users from significantly
exceeding the per volume limits; there is no mechanism to
warn or prevent users from exceeding the project's SHELL
quota before it is exceeded. E.g., in the example above,
both users moe and groucho have volumes with 1 TB per volume
limits, and almost no usage. They can both add up to 1 TB of
data to their respective volumes, and the operating system will
not prevent it because it is within the 1 TB per volume limit.
However, assuming there was not a drastic reduction in usage
in other volumes, that would be the total usage for the
test
project up to around 3.5 TB, significantly over
the 2.5 TB quota.
If at some point the usage over all of the volumes for a project exceeds the project's SHELL allocation "quota", we will send email to the PI and managers of the project informing them of the situation and asking that the issue be rectified in a reasonable amount of time, typically one week. Rectifying the issue can include a combination of:
If you need help figuring out how to rectify the situation, or if there are extenuating circumstances which might warrant an extension of the time limit to resolve the matter, please let us know. While our responsibilities towards other users on the cluster will not allow us to ignore such overages, we are willing to work with you to find a mutually acceptable solution.
The SHELL storage tier uses the Auristor File System, which is basically an enhanced version of the AFS filesystem that has been in use by the campus Glue system for many years. This is a global filesystem with clients available for all modern operating systems, which makes the content securely accessible (secured by Kerberos credentials) wherever it is needed.
|
Because the SHELL storage is protected by Kerberos, you must have valid
kerberos tickets/AFS tokens to access your files. These tokens will normally
be create dfor you automatically when you login using your password, or if
you do a passwordless login using Kerberos tickets obtained
on another system. They will not be created if you login
in using RSA or other public key authentication --- this authentication method
is simply unable to obtain Kerberos tickets for you. Please also note that
Kerberos tickets and AFS tokens have expiration times; typically 8-24 hours
after they are obtained. If your AFS tokens expire, you will no longer be
able to access SHELL storage until you renew them.
|
If you do not have AFS tokens or they expired, you can renew
Kerberos tickets and AFS tokens by issuing the command
renew
on the cluster. This generally will require
you to enter you UMD LDAP directory password.
If you have set up RSA or similar public key authentication for ssh to avoid entering a password everytime you access the cluster, this becomes problematic. While you can still use such to access much of the cluster, you will need to use the kinit and aklog commands to get AFS tokens to access the SHELL storage, which defeats the goal of password-less login. Unfortunately, the nature of public key authentication simply makes it incapable of obtaining Kerberos tickets and AFS tokens for you.
A better approach is to install a Kerberos client on your workstation, and run the kinit command once on your workstation before opening sessions to the cluster. This will require you to use your password, but it is a once-a-day or so type of operation. You can then configure ssh to allow GSSAPI authentication when access the cluster login nodes, which will allow you to access the cluster without entering your password again. The forwarded Kerberos tickets can then be used to obtain AFS tokens for you, allowing you to access SHELL storage.
The Auristor filesystem used as the underlying filesystem of the SHELL storage tier is a potentially globally distributed file system which can securely provide access to files and data based on Kerberos credentials. What this means is that by installing the appropriate client software on your local workstation or laptop, you can access your SHELL space as if it were a local filesystem.
The process to do this depends on the OS running on your local workstation, and we split into different sections for different OSes:
Default Cell
should be set to
shell.umd.edu
(for the SHELL storage tier on Zaratan)
Integrated logon
should be set to Disable
Cache size
: keep the default
Custom Setup
) gives you options for
what to install. Just use the defaults.
Install
button to proceed.
Yes
button to proceed.
Finish
button to exit the setup wizard.
After the system reboots, you can open a command prompt from the Start Menu
and issue the command:
kinit MYUSERNAME@UMD.EDU
followed by aklog
,
replacing MYUSERNAME with your login name on Zaratan (which should
be the part of your @umd.edu
or @terpmail.umd.edu
email address to the left of the "at" sign (@
), and will
normally be all lowercase). The @UMD.EDU
must
be all uppercase. This will give you Kerberos tickets on your Windows
workstation. This kinit
step will need to be repeated every
time you reboot your workstation (at least if you plan to use password-less
ssh in that session), or when your Kerberos tickets expire (typically one
day).
The above steps installed the OpenAFS client on your system, and you have valid Kerberos tickets. We now discuss:
Although the above kinit
step will obtain Kerberos tickets for
you, you still need to configure your ssh client to authenticate to the
remote system using these tickets. The steps to accomplish this depends on
the specific ssh client you are using.
For the putty
ssh client, do the following:
SSH
, then the
Auth
, and then the GSSAPI
pane. On this pane,
make sure the two boxes Attempt GSSAPI authentication
and
Allow GSSAPI credential delegation
.
Connection
and Data
in the
configuration menus, and in the field Auto-login username
enter your username on the Zaratan cluster.
Although the above aklog
step will obtain AFS tokens for
you, you still need to mount the SHELL directory or directories on
your workstation. To do this:
Computer
| Map Network Drive
.
This should open a Map Network Drive
window.
Drive
.
You can use this to select which drive letter the SHELL folder
should be mounted as.
Folder
. This
is where you provide the path to the SHELL folder you wish to mount.
You should enter the full path to directory, with forward slashes
(i.e. '/') being converted to backslashes ('\'), and two
backslashes at the start. E.g, if you want to mount your personal
SHELL directory from project foo
, and your username
is smith
, you would enter
\\afs\shell.umd.edu\project\foo\user\smith
.
Note that you cannot use the symlink ~SHELL
or similar
here.
Reconnect
at login
box souns tempting, but since you will not have
AFS tokens at logon (not until you issue the kinit and aklog
commands), I do not believe it will provide the functionality
desired.
Finish
button.
This should result in the SHELL directory specified being mounted at the drive letter specified. You might wish to mount your personal directory and the share directory for the same project to different letters, or if you belong to multiple projects, mount your personal space from each project to a different drive letter.
This section is still under construction.
You can >Download the OpenAFS Client Installer for Mac from Auristor.
This section is still under construction.
You can >Download the OpenAFS Client Installer for Linux from Auristor.
A proper archival storage tier is for long term storage of data, especially data which is infrequently accessed. The generally requires automatic backups and guarantees on the lifetime of the data. Unfortunately the HPC clusters at UMD do not currently provide an archival storage tier. The BeeGFS and lustre filesystems on the various HPC resources are intended for ongoing research on the cluster. These storage resources are limited and are not intended for long term or archival storage, as they are needed for people to run jobs on the cluster. You are required to delete data that is no longer needed, and to move data that needs to be retained elsewhere once it is not needed to support queued or running jobs.
Campus provides user the ability to store large amounts of data via on campus and cloud-based services, namely:
Campus maintains a networked file storage system which can be accessed either by the NFS protocol (suitable for access by Unix-like systems) or the CIFS/SMB protocol (suitable for access by Windows-like systems).
Pricing and other information, along with links to the forms to request such service, can be found at the Networked Storage service catalog
Campus provides the ability to store large amounts of data on Google's G Suite drive. Please see the Google drive service catalog entry for more information, including restrictions on what data can be stored there and how to set up your account if you have not done so already.
The recommended utility for accessing Google drive from the HPC cluster is to use the rclone command:
In addition to supporting many Cloud storage providers, it also has features to prevent exceeding Google's data rate limits.
The gdrive command is also available, but it tends to exceed Google's rate limits when moving large amounts of data.
Campus also provides the ability to store large amounts on the Box cloud-based storage platform. Please see the UMD Box service catalog entry for more information, including restrictions on what data can be stored there and how to set up your account if you have not done so already.
The UMD Box service can be accessed from the cluster in several ways. We recommend using the rclone command:
Alternatively, one could use an ftps
client.
NOTE: although similar in name and function,
ftps
is not the same as sftp
. They
are different protocols, and Box does NOT support sftp at this
time. Probably the best command line ftps utility is the
lftp
command; see:
Your home directory as configured is private and only you have access to it. Any directories you create outside your home directory are your responsibility to secure appropriately. If you are unsure of how to do so, please submit a help ticket requesting assistance.
If you're a member of a group, you'll want to make sure that you give
your group access to these directories, and you may want to consider
setting your umask so that any files you create automatically have
group read and write access. To do so, add the line umask
002
to your .cshrc.mine
file.
|
If your jobs process sensitive data, it is strongly
recommended that you submit all such jobs in
exclusive mode
to prevent other jobs/users from running on the same node(s) as your
job.
|