This page covers some preliminary information about using the environment on the HPC clusters. I.e., the stuff you need to know how to do before we can even start talking about submitting jobs to the system using the command line.
Note: Users who are uncomfortable or unfamiliar with the Unix/Linux command line should consider using the OnDemand Web portal for accessing the cluster. While the command line is the more powerful interface to the cluster, the OnDemand portal has a gentler learning curve. The OnDemand portal is only currently available for the Zaratan and Juggernaut clusters.
Each cluster has at least two nodes available for users to log into. From these login nodes, you can submit and monitor your jobs, compile codes, look at the results of jobs, etc. These nodes can also be used for transferring files and data to/from the cluster and other systems.. However, for large data transfers, there are data transfer nodes (listed below which should be used instead of the login nodes).
|
DO NOT RUN computationally intensive processes on the login nodes!!!.
These are in violation of policy, interfere with other users of the
clusters, and will be killed without warning. Repeated offenses
can lead to suspension of your privilege to use the clusters.
|
For most tasks you will wish to accomplish, you will start by
logging into one of the login nodes for the appropriate cluster.
To log into the cluster, you need to use the
Secure Shell protocol
(SSH)
.
This is usually standardly installed as ssh
on Unix systems,
and clients are available for Windows, Mac, and even Android, however on
non-Unix systems you typically must install an SSH client
.
Once you have an ssh client is installed on your system, you just tell
it you wish to connect to the one of the login nodes for the cluster desired.
Assuming your official UMD email address is johndoe@umd.edu
,
Cluster | Login Node | Username |
---|---|---|
Zaratan | login.zaratan.umd.edu | johndoe |
Juggernaut | login.juggernaut.umd.edu | johndoe |
|
NOTE: Zaratan requires
multifactor authentication (MFA) using the
standard campus DUO MFA system.
Either you must be on the
standard campus VPN
(which requires MFA to authenticate), or when you ssh you will get prompted
to enter your passcode or a single digit to send a "push" to a phone after
entering your password,
|
On a TerpConnect/Glue system, you would just issue the command
ssh login.zaratan.umd.edu
to connect to Zaratan, or
similarly ssh login.juggernaut.umd.edu
to connect to
Juggernaut. The unix ssh command by
default assumes your login name on the remote system is the same as on the
local system, which is true for the UMD HPC clusters and
for TerpConnect/Glue systems.
From other Unix systems, you might need to specify your
cluster username,
e.g. sshUSERNAME@login.zaratan.umd.edu
or ssh -l USERNAME login.zaratan.umd.edu
.
, where USERNAME is your Zaratan username.
zaratan.umd.edu
or juggernaut.umd.edu
These are the master nodes for the clusters, and you do NOT have
access to them.Starting with Zaratan we are requiring multifactor authentication to access the HPC clusters. We are using the standard campus DUO MFA system.
Since the standard campus VPN does multifactor authentication, if you are on the VPN, ssh connections to the login nodes will not prompt you for multifactor --- you just need to enter your password as before.
If you are not on campus VPN, when you ssh to one of the Zaratan login nodes, you will first be prompted for your password and then you will be prompted to enter a passcode or a single digit from a menu for a "push" or phone call for verification. E.g., you will see something like the session below, and at the passcode prompt you can enter a passcode from the Duo app on your mobile phone, or have Duo send a push or make a phone call to a previously registered device for the second authentication factor.
For more information, see the web page on the campus Duo Multifactor Authentication System.
my-workstation:~: ssh login.zaratan.umd.edu
* * * WARNING * * *
Unauthorized access to this computer is in violation of Md.
Annotated Code, Criminal Law Article sections 8-606 and 7-302 and the
Computer Fraud and Abuse Act, 18 U.S.C. sections 1030 et seq. The University
may monitor use of its computing resources as permitted by state
and federal law, including the Electronic Communications Privacy Act,
18 U.S.C. sections 2510-2521 and the Md. Annotated Code, Courts and Judicial
Proceedings Article, Section 10, Subtitle 4. Anyone using this system
acknowledges that all use is subject to University of Maryland Policy
on the Acceptable Use of Information Technology Resources available at
http://www.umd.edu/aup.
By logging in I acknowledge and agree to all terms and conditions
regarding my access and the information contained therein.
To report problems or request assistance call the Help Desk at 301-405-1500
Password:
Enter a passcode or select one of the following options:
1. Duo Push to XXX-XXX-1234
2. Phone call to XXX-XXX-1234
3. Phone call to XXX-XXX-4444
Passcode or option (1-3):
The first time you connect to the system via ssh, you might receive a warning message like (color added to assist in discussion below):
The authenticity of host 'login.zaratan.umd.edu' can't be established.
#RSA key fingerprint is e8:41:71:ac:fc:4c:08:c7:bc:0f:f0:33:95:5b:c4:e0
# Are you sure you want to continue connecting (yes/no)?
The message sounds scary, but it is normal. SSH tries very hard to protect you from all sorts of hacking, and one such mechanism is to try to ensure you are talking to the system you think you are talking to. For every server you connect to, it remembers a secret (RSA fingerprint) to prove the identity of the server, and verifies that (for brevity, this is a gross oversimplification; for more information). But it cannot verify it the very first time you connect (unless, as is the case on some campus systems, systems staff have pre-populated SSH with informationa about the system you are connecting to). This message is just to inform you about that.
The IP address and the hostname (in green above) may vary, although the hostname should match the name of the system you want to connect to. The parts in red (the key type and the fingerprint) will depend on the system you are trying to communicate with. To be secure, you should verify that it matches one of the fingerprints listed below:
login.zaratan.umd.edu
e8:41:71:ac:fc:4c:08:c7:bc:0f:f0:33:95:5b:c4:e0
tHrtumZ7yW/Ucnm/mPGcpWfIaRVO/FccsheoWDv6MWM
18:44:38:42:fb:f5:b9:e7:a0:07:9a:b1:5a:cc:5e:82
9BISdMgfLdqjR1GmFFam1WKpOrdhlEpsxquCG1jlQDg
28:2c:bf:74:75:d5:62:65:84:74:f2:0f:a8:ff:fe:12
XGCCZxapg+RqECPFMhTfRci09sJlsKyhY/CtoTq2R8k
4e:41:a6:07:3f:f6:a1:12:fc:01:6b:c6:42:de:18:4e
Ot4bTjdmv3t8Lwn2uETVlAFPzhFoQafQ7tt+oHAN69w
login.juggernaut.umd.edu
df:b7:48:55:fa:1c:80:41:22:64:d5:f7:04:7a:10:e1
guk52KukC0EgCBvFM1zLRXPbWFz7QQv/e+0htklsyyA
5f:33:be:43:97:b5:5c:26:f5:f8:a4:77:61:3c:54:e6
6z3yzKV+7y+Y4Ynee1jPYhYCLWhn9NwXPcn7CRDPcLM
f2:08:8f:a8:62:51:3a:fd:d2:c1:90:9f:92:a6:f6:f9
5KNszbuzqNvDS+R9IcbosU1+rCYuAHtKmHzFXZ0iUz8
d7:4d:43:05:67:a2:d5:6c:cd:41:4c:0c:e7:7d:cf:33
tFL+UfXWrw5vyHFsaWtWIH0CpN1tJjp0CAvPTjkZaEE
If the fingerprint does NOT match, you should NOT enter your password and contact system staff ASAP. If you enter your password in such a situation, as it is possible that someone is performing a man-in-the-middle attack and can obtain your password when you enter it.
If you see a message like
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that the RSA host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
5c:9b:16:56:a6:cd:11:10:3a:cd:1b:a2:91:cd:e5:1c.
Please contact your system administrator.
you should always refrain from logging in and contact system staff as soon as possible. This means that the server you are connecting to did not know the secret remembered by SSH for that system, as described above, which means either system staff changed the keys, or someone is hacking you. As system staff do not change the keys often, and will send email to everyone well in advance of changing the keys warning you of this if we were to, this likely means someone is attacking you unless you received an email from systems staff. Do NOT enter your password and contact system staff.
|
NOTE: If you login to Zaratan using public-key
authentication, you will not have Kerberos tickets or AFS tokens, so
certain commands will not work. For example, you will not be able to
access the contents of your We recommend using kinit on your workstation to achieve a mostly "password-less" ssh capability which will provide you with Kerberos tickets and AFS tokens after ssh-ing into the cluster. |
This section discusses how to setup and use SSH with public-key based authentication to allow for access to the cluster without typing your password for every ssh session. It can also be used to allow passwordless ssh between the various login and compute nodes; this is useful if you are using ssh to spawn processes on the different nodes allocated to your job.
The procedures listed in this section are NOT required for you to access the system, and you can use normal password based authentication instead. It also is NOT required for most multinode jobs (using MPI or srun). It is only for users who wish to set up public-key authentication, either to allow passwordless access to the cluster from your workstation, or to allow passwordless ssh between the nodes of the cluster.
If you are new to Unix/Linux, it is recommended that you skip this section and just use password based authentication.
Public-key authentication uses asymetric encryption for its security. In asymmetric cryptographic systems, there exist distinct private and public keys. Data can be encrypted and/or digitally signed with the private key and can only be decrypted/signature verified with the public key. The private key is kept secure on your workstation, and the public key is copied to the HPC cluster. When you log in with public key encrpytion from your workstation, the ssh client on your workstation will digitally sign a standard message and send this to the sshd server process on the cluster along with the appropriate public key. The remote sshd process verifies that the public key is authorized, and if so verifies if the digital signature is valid. If the signature is valid, that means the ssh client is in possession of the private key corresponding to the public key, and because the public key is authorized, grants it access to the system.
|
NOTE: the public-key authentication process grants
login access to anyone/any system/etc. that knows your private key.
Thus you need to ensure the security of your private key. Protect
your private key at least as strongly as you would your password.
|
Instructions for setting up public key authentication is discussed below. We break it down into two cases:
Although the process is essentially the same in each case, because of the shared home directory among the nodes of the cluster, the first case is a bit simpler and will be treated separately. The second case is slightly more complicated because there are steps which need to be done on both your workstation and on the HPC cluster.
Depending on your needs, you can do none of the steps, either one of the step, or both of them.
In certain cases, it might be necessary to enable passwordless ssh
between the nodes of the cluster. The most typical such case is if
you must use the ssh
command to launch processes on multiple
nodes as part of your job. (Most multinode jobs use MPI and/or
the srun
command, and so do not require this, but some may.)
Because you cannot feasibly enter a password with a batch script, you need to
enable passwordless ssh in such cases.
Because your home directory is shared among all nodes in the HPC cluster, everything for this process can be done on one of the HPC login nodes. Just log into a login node on the cluster, and then:
ls -l ~/.ssh/id_rsa
.~/.ssh/id_rsa
already exists, you should
already have the public host keys and should not need to do anything
else and can skip the running the
ssh-keygen
command, although you should still verify the
ownership and permission of the file.authorized_keys
file on the other systems if you
regenerate the HPC host key.
ssh-keygen
command. Accept all of the default value for the name of the file in
which to save the key; the public key will be stored in the same
directory, with a .pub
extension. Generally, you should
not enter a passphrase (just hit return at the two passphrase prompts)
--- the most common case is to enable passwordless ssh among nodes in
the cluster for use in job scripts, and that will not work if the key
has a passphrase.
ls -l ~/.ssh/id_rsa
command from above. It
should now show the existance of the file. Please verify that the
file is owned by you, and no one can access it but you. (Permission
flags should be -rw-------.
).
authorized_keys
file.
~/.ssh/authorized_keys
already exists, you should
append the contents of ~/.ssh/id_rsa.pub
to the file; you
can do this with the command
'cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
'.
authorized_keys
file does not exist, yo can create
it with the proper contents with the command:
'cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
'.
|
Please ensure that your
id_rsa file is only readable by
you. This host key is all that is needed to access any system which
has the corresponding public key added to its authorized_keys
file.
|
You should now be able to ssh between nodes in the cluster without entering a password. Note that you do not have access to ssh to compute nodes unless you have a job currently running on the node. Typically this passwordless ssh is only needed to ssh between nodes allocated to a job within your job script.
Atlhough the process is in this case uses basically the same steps as above, because your workstation and the HPC cluster do not share a common home directory, the different steps need to be performed on different systems, and there is a file transfer required.
Some of the steps below are dependent on the ssh application being used on your workstation/desktop/laptop/etc. We are assuming putty on Windows systems and the standard command line openssh client on Macs or Linux systems. If you are using something else, hopefully it has some documentation describing how to set up public key authentication and which together with the instructions belong can be used to figure out what you need to do.
PuTTYgen
utility to generate the key pair. Open
the PuTTYgen
utility, and
PuTTY Key Generator
window,
in the Parameters
section, select the type of key to
generate. It is recommended to use "RSA" (perhaps called "SSH-2 RSA"
in some older versions) using the default (2048) number of bits.
Generate
in Actions
section. You
will be prompted to use your mouse/etc. to generate entropy that
will help make the private key secure. Move the cursor around until
the utility has generated the key (it will display in the area
under Key
).
Key passphrase
and Confirm
passphrase
boxes. Encrypting the private key with a
passphrase will increase security, and with the Pageant
utility provided with PuTTY you only need to enter the password once
per login session on your workstation.
Save public
key
under Actions
(next to Save the
generated key
). Enter the name (e.g.
putty_public_key
) and folder, and click Save
.
Save private key
under Actions
(next to Save the private key
). If you did not opt
to encrypt the private key, it will ask if you are sure that
you wish to save an unencrypted private key. "Yes" will proceed
to save; "No" will allow you to go back and specify a passphrase
to use. Use the default format/Save as type
(PuTTY Private Key Files (*.ppk)
), enter the name
(e.g. putty_private_key
) and folder, and click
Save
.
ssh-keygen -t rsa
.
id-rsa
in the .ssh
subdirectory of your home directory. It is recommended that
you use this default otherwise you need to tell the ssh or ssh-agent
command which identity file to use when logging in.
.pub
suffix added.
ssh-agent
(described below) you only need to enter the passphrase once per
login session on your workstation. If you opt not to use a passphrase,
just hit return without entering anything to leave the private key
unencrypted.
~/.ssh/id_rsa.pub
. For Windows users,
it will be the name used when you saved the public key (e.g.
putty_public_key
). DO NOT
transfer the private key file --- that should remain in a protected
spot on your workstation.
mkdir -p ~/.ssh
to create the
.ssh
directory under your home directory if it does
not already exist (the command will not harm anything if it does
already exist).
touch ~/.ssh/authorized_keys
. If
there is no authorized_keys
file in the
.ssh
subdirectory of your home directory, this command
will create an empty file. If there was one, it does not change
the contents of the file.
chmod 600 ~/.ssh/authorized_keys
to ensure the proper permissions on the file. No one but you
should be able to read or write to the file.
cat PUBLIC_KEY_FILE >> ~/.ssh/authorized_keys
to append the public key file to the authorized_keys
file. Be sure to use TWO > characters without
space between them in the above command (otherwise you might
overwrite the file and lose previous contents).
The PUBLIC_KEY_FILE
in the above command
should be replaced by the name of the public key file you just
copied to the cluster; e.g. id_rsa.pub
for Linux
or Macs or putty_public_key
or whatever you saved
as on Windows.
rm PUBLIC_KEY_FILE
as it is no longer necessary.
|
NOTE: Your private key is all that someone needs
to access the cluster as you. KEEP THE PRIVATE KEY FILE SECURE.
It is strongly suggested that you encrypt the private key file with
a passphrase, so that both the passphrase and the file are needed to
access the cluster.
|
You should now be able to ssh to the Zaratan cluster from your workstation using public key authentication, as described below.
In this section we discuss how to use public-key authentication with ssh. Because it is good practice to encrypt your private key, typical usage involves starting an agent that runs in the background. You start the agent when you log into your workstation, and it loads your un-encrypted private key into memory at that time (it will need to ask you for the passphrase to decrypt it if it was encrypted). Then for the rest of the time you are logged into your workstation, it will provide the key to the ssh client so that you can connect to the cluster without providing a password. The agent differs according to what ssh client you use.
|
NOTE: If you login to Zaratan using public-key
authentication, you will not have Kerberos tickets or AFS tokens, so
certain commands will not work. For example, you will not be able to
access the contents of your We recommend using kinit on your workstation to achieve a mostly "password-less" ssh capability which will provide you with Kerberos tickets and AFS tokens after ssh-ing into the cluster. |
On windows systems, should be a Pageant SSH authentication agent installed with PuTTY.
SSH
under
Connection
to expand the SSH options.
Auth
subtree that now appears
under SSH
Pageant
SSH authentication agent. It runs
in the background, so when it is open you will just see a new icon
(a computer wearing a hat) for it in the Windows notification tray.
Double click on that icon to open up the Pageant window.
Add Key
to add the
private key created in the previous section. This will bring up a
file dialog, so find the private key you created (e.g.
putty_private_key.ppk
) and "open" it.
You can now use PuTTY to login into the Deepthought2 login nodes as before, and it will use public key authentication and not ask for your password on the cluster.
You might wish to have Pageant start up whenever you login to your Windows workstation. To do that:
Startup
folder, and select
New
and Shortcut
.
Type the location of the item
box, you should
enter the path to pageant.exe
followed by a space and
the full path to your private key (e.g.
putty_private_key.ppk
). You should put both paths (the
executable and the private key) in double quotes.
Next
and enter a name for the shortcut
(e.g. Pageant).
E.g.
"C:\Program Files (x86)\PuTTy\pageant.exe" "C:\Users\user_profile\ssh_key\putty_private_key.ppk"
Then, every time you log into the Windows workstation, it will start Pageant (prompting you for the encryption key for the private key if it is encrypted) and you can use PuTTY to ssh to the Deepthought2 login nodes without entering any additional passwords for the remainder of your workstation session.
On Linux or Mac systems, there are ssh-agent
and
ssh-add
commands that should be standard. You can start
the background agent by issuing the command ssh-agent
.
You can then add private keys to the agent using the ssh-add
command. If you use the default location for your private key (e.g.
~/.ssh/id_rsa
) you can just issue the command
ssh-add
and it will add any private keys in the
default locations. If you added the key in a non-standard path,
just use ssh-add PATH_TO_PRIVATE_KEY
where
PATH_TO_PRIVATE_KEY is the path to the private key file
to use. If the private key is encrypted, the ssh-add
will prompt you for the passphrase to decrypt it.
At this point, you can just
ssh USERNAME@login.zaratan.umd.edu
(where USERNAME is your username on the Zaratan cluster)
and you will be logged into a login node using public key authentication;
i.e. no additional password needed.
You can combine the ssh-agent
and ssh-add
commands
into a single shell script if you want, or add to your .cshrc
or .bashrc
initialization scripts if you want them to start
automatically when you login to your workstation.
We recognize that people will often start multiple ssh sessions
to the cluster, and that typing in your password for each such
session is annoying. While
SSH public key authentication will allow you to log into
the cluster without typing a password, you will not get
Kerberos tickets or AFS tokens that way, and therefore you
will not be able to access any data on the SHELL filesystem
unless you subsequently issue the renew
command,
which will require you to enter a password, defeating the
goal of passwordless login.
The recommended approach is to install a Kerberos client on your workstation and configure your SSH client to use GSSAPI/Kerberos authentication when connecting to the cluster. Then when you log onto your workstation each day you can issue a command to obtain a new set of Kerberos tickets, and when you ssh into the login nodes of the cluster it will not request a password, and the system will automatically obtain AFS tokens for accessing your SHELL storage based on the Kerberos tickets.
The configuration process depends on the operating system of your workstation:
|
This section is still under construction. The instructions
which follow have not been fully tested, but we expect that
they should at least mostly work. Please let us know if you
experience any difficulties.
|
On Windows systems, you will need to install a Kerberos client in order to get valid Kerberos tickets on your workstation. We recommend installing the Auristor OpenAFS client, as that will provide both a suitable Kerberos client but also allow you (if so desired) to access SHELL storage from your Windows workstation.
Default Cell
should be set to
shell.umd.edu
(for the SHELL storage tier on Zaratan)
Integrated logon
should be set to Disable
Cache size
: keep the default
Custom Setup
) gives you options for
what to install. Just use the defaults.
Install
button to proceed.
Yes
button to proceed.
Finish
button to exit the setup wizard.
After the system reboots, you can open a command prompt from the Start Menu
and issue the command:
kinit MYUSERNAME@UMD.EDU
replacing MYUSERNAME with your login name on Zaratan (which should
be the part of your @umd.edu
or @terpmail.umd.edu
email address to the left of the "at" sign (@
), and will
normally be all lowercase). The @UMD.EDU
must
be all uppercase. This will give you Kerberos tickets on your Windows
workstation. This kinit
step will need to be repeated every
time you reboot your workstation (at least if you plan to use password-less
ssh in that session), or when your Kerberos tickets expire (typically one
day).
Although the above kinit
step will obtain Kerberos tickets for
you, you still need to configure your ssh client to authenticate to the
remote system using these tickets. The steps to accomplish this depends on
the specific ssh client you are using.
For the putty
ssh client, do the following:
SSH
, then the
Auth
, and then the GSSAPI
pane. On this pane,
make sure the two boxes Attempt GSSAPI authentication
and
Allow GSSAPI credential delegation
.
Connection
and Data
in the
configuration menus, and in the field Auto-login username
enter your username on the Zaratan cluster.
The above ssh client configuration should only need to be done once. After doing that, and assuming you have valid Kerberos tickets, you should be able to ssh into the Zaratan login nodes without an additional password prompt (although you will see a multi-factor prompt if not on the campus VPN).
The Kerberos client should already be installed on recent MacOS systems, so you should not need to do anything to install it.
The process to configure SSH to use Kerberos for authentication is the same on Macs as it is for linux systems.
Many modern linux distributions come with Kerberos
clients automatically installed. You can verify this
by issuing the command which kinit
---
if that returns a path to the kinit command, you
should have the required packages already installed.
If the which kinit
command returns an
error saying something like kinit: Command not
found
, then a kerberos client is not properly
installed on your system. You should use whatever
packaging system is appropriate for your distribution
(e.g. dnf
or yum
for RedHat
and Fedora-like systems, dpkg
,
apt
or similar commands for Debian,
Ubuntu, Mint and related systems) to find and install
the appropriate package. The package names are
distribution dependent, but Typical names are:
The proper package name will usually be something like
one of the above; you probably just need one package,
and likely will not find all of the above
packages. You should not need to
install any packages with server
in the
name for this. The krb5
packages typically
use =MIT's
implementation of Kerberos, and the heimdal
named packages use the
Heimdal implementation --- for this purpose you can
use either implementation, and we recommend using whichever
one is best supported for your distribution.
Once a Kerberos client is installed, you need to configure
the ssh
to send your Kerberos credentials
to the login nodes of the cluster. To do that, you can
edit (or create if needed) a file named config
in the directory .ssh
under your home directory
and add the following lines:
Host *.umd.edu
GSSAPIAuthentication yes
GSSAPIDelegateCredentials yes
The first line Host *.umd.edu
specifies which
hosts the configuration is restricted to, and the next two
lines instruct the ssh
to attempt to login via
the Kerberos tickets on your workstation when connecting to
the named hosts. If the attempt to login via Kerberos tickets
fails, it will fall back to requesting a password. If it can
authenticate using Kerberos, it will also (due to the
GSSAPIDelegateCredentials
directive) forward you
Kerberos tickets to the system you are logging into; this means
that your login session on the remote system will have valid
Kerberos tickets..
You should not forward your Kerberos tickets to
untrusted systems; although Kerberos tickets are encrypted and
do not contain your password, if a malicious user can access
your Kerberos tickets, they can impersonate you until the tickets
expire. Since your Kerberos credentials should not grant you
access to any non-UMD system, in the snippet above we restrict
ssh
to only do Kerberos authentication and only
forwardi tickets when connecting to a system in the
umd.edu
domain. You could change the string
after the Host
directive to e.g. tighten it only
to a specific HPC cluster, but leaving at *.umd.edu
should be safe and will cover all UMD HPC clusters as well as
the Glue/TerpConnect/GRACE systems.
If a section for the desired host expression already exists,
you can jsut add the GSSAPIAuthentication
and
GSSAPIDelegateCredentials
lines into the existing
section. You might also with to add the lines:
ServerAliveInterval 60
ForwardAgent yes
ForwardX11 yes
ForwardX11Trusted yes
The first line will help reduce SSH timeouts due to inactivity
of the terminal. The various Forward*
have to
do with forwarding X11 graphical connections and similar. None
of these lines are not required to do "passwordless"
authentication, but you might find them useful.
This section discusses running graphical applications on the login nodes of the cluster with the graphics appearing on your desktop using the network features of the X11 Windowing System. In order for this to work, however, you need to be running an X11 server on your desktop, which we discuss below. However, while the procedures described below should still work, we recommend instead that you look into using the Interactive Desktop of the OnDemand portal; this will allow you to start up an interactive grpahical job on a compute node with the graphics displaying in a window on your web browser. This is usually significantly easier to use than setting up X11 as described below.
The exact procedure for installing and running an X11 server on your local workstation depends on what your desktop, etc. is, and you might wish to contact the UMD helpdesk for assistance.
DISPLAY
environmental variable (e.g.
echo $DISPLAY
should return something, usually your hostname
followed by a colon and a number). You might need to add a -X
to your ssh
command to allow ssh to forward X11 connections.
Some software packages have their own protocols for remote visualization, including:
This is a very short list of some very basic Unix commands to help you get started in the HPC environment.
ls
command.
With no arguments, it lists all files in the directory. You can
give it a -l
option to show more detail, and/or a file
or directory name to show that file or directory.
man
command. E.g.,
you can use man ls
to find out all about the ls
command and its options.
du
. We recommend using the options
--si --apparent-size
followed by the name of the top
directory. E.g. the command du --si --apparent-size ~
will display the usage of your home directory (~
is an
shorthand alias for your home directory) and all directories underneath
it (excluding stuff like ~/scratch
and ~/SHELL
which are only symlinks to other filesystems).
mkdir
command. Give it
the name of the new directory to make, e.g. mkdir foo
.
cd
command. Give it the
name of the directory to change to, e.g. cd foo
. You can
also use the pushd
and popd
to temporarily go
to another directory and return. If I am in the directory foo
and I issue pushd /export/lustre/newfoo
, do some stuff, then
popd
, I will be back in the foo
directory I
started in.
cp
command. Usage is
cp FILE1 FILE2
, which will copy an
already existing file FILE1 to FILE2.
mv
command, e.g. mv FILE1 FILE2
to rename FILE1 to FILE2.
sftp
and/or scp
. If you are on one
of the Zaratan login nodes and wish to transfer files to/from a system
which can act as a server for the scp
or sftp
protocols, you do one of
scp FILE1 USERNAME@RHOST:FILE2
to
copy FILE1 on a Zaratan login node to the system
RHOST, putting it with the name FILE2 on
the remote system. You can leave out FILE2 to keep
the same name (but be sure to keep the colon (:)).
USERNAME is your username on the remote system; you can
omit the USERNAME@ if it is the same as on the HPC system.
scp USERNAME@RHOST:FILE2 FILE1
does
the reverse; it copies FILE2 from RHOST to
the login node.
sftp RHOST
will start an interactive SFTP
session from one of the Zaratan cluster's login nodes to the SFTP
server on system RHOST. You can use the get
command to copy files from RHOST to the local system, or
put
to copy files from the local system to
RHOST. For more information, see the following
tutorial on using sftp.
less
command. You can use
something like less FILE1
to view the contents of the text
file FILE1. You can use the space key to page forward, the "b" key to
page back, up and down arrows to scroll a line up or down, and "q" to
exit. You can also use it to read long output from another command, e.g.
squeue | less
to allow you to page through all the jobs
running on the system.
vi
or emacs
, we
recommend that you stick with nano
.
nano
This is a simple,
WYSIWYG
text editor, similar to the default text editor for the old
pine email client. For more information, see one of the knowledge
base articles below:
nano
is
an open-source rewrite of the old pico
editor, with some enhancements. Most documentation
for pico
will apply to nano
as well, just replace pico with nano in the examples).
vi
or vim
: These are more powerful
text editors, with vim
being a clone of the original
vi editor, but
with more power comes additional complexity. For more information,
see the knowledge base articles below:
emacs
: This is an implementation of the powerful
Emacs
text editor. Again, while this is a very powerful text editor, it
is also complex to learn. For more information, see the knowledge
base articles below:
logout
Hopefully, the above list will get you through most of the basic tasks you need to complete. A full tutorial on using Unix is beyond the scope of this document. However, there are many tutorials for beginning to use Unix on the web. A few are tutorials we recommend are:
The shell is the command line prompt that you interact
with. It also is what processes shell scripts. Although there are a number
of shells available on DIT maintained Unix systems, the most common are
tcsh and
bash. Your default
shell is the shell you get when you log into a system; by default
on the Zaratan cluster
it is set to /bin/bash
. (NOTE: This is a change
from the previous clusters, e.g. Deepthought2)
To change your default shell on Zaratan, currently you need to submit help ticket to HPC staff; be sure to include your username on the cluster and the new login shell you wish to use. At some point in the future, we hope to implement a process to allow you to change your shell without administrator intervention.
chsh
command. To ensure that, you should run
renew
first, and enter your password if requested.
If you do NOT have valid credentials, the chsh
command
will fail with an error message about being unable to create a file
and ask if you are authenticated. Run renew
and
try again.
chsh
command also allows you to change your room and
phone numbers as listed in the Unix password file (i.e. the output of the
finger
command). You can just hit return to keep the values
unchanged.
chsh
command, it can take up to an hour for
the changes you request to take effect. And for changing the default shell,
this will only impact future logins.
/bin/tcsh
or
/bin/bash
.
In order to provide an environment that can support a wide variety of
users, many software packages are available for your use.
However, in order to simplify each individual user's environment, a
large number of those packages are not included in your default
PATH
.
Your account as provided gives you access to the basic tools needed to
submit and monitor jobs, access basic Gnu compilers, etc.
If you need to make modifications to your login environment, you can do so
by modifying your .bashrc
file as necessary. (If you
changed your default shell to tcsh or csh you should
edit your .cshrc
file instead.)
If you do change these files, it is HIGHLY advised that you DO NOT remove the part near the beginning that sources the global definitions, e.g. the part like
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
For packages that are not included in your default environment,
(which is almost everything beyond basic Unix commands like ls, cat, and
editors) the module
command is provided. When run, this
command will modify your current session by adding the appropriate
entries to your PATH
and whatever other variables are
necessary to ensure the proper functioning of the package in question.
Note that these changes are temporary and only exist until you log
out. If you want to have code
run for you automatically,
add the command module load PACKAGE
to your
shell initialization scripts (e.g. .bashrc or .cshrc).
Because many research codes are comples and depend on many libraries, and loading different versions of the same library in a given package can cause serious or subtle errors, the software library often has multiple builds of packages to ensure a consistent set of libraries. The module command has some intelligence built into it to try to ensure that a consistent set of packages are loaded. In general, it works best if you first load the desired compiler (and version), then the MPI library (if you will be using MPI versions of packages), and then any other packages you wish to use.
The full names of the module files to load are actually rather long, and include the compiler and some other dependencies or variants of the build. However, usually it is sufficient to just give the package name in a module load; although you might wish to specify a version, especially in job scripts, as without a version the module load command will normally load latest version (compatible with a previously loaded compiler and/or MPI library) installed on the cluster, which can change without notice.
The following additional subcommands for the module command are often useful:
module load PKG
command.
The software page contains listing of the various packages available. If you click on the package name, you will get detailed information as to which versions are available, and whether they are available on the compute nodes or not of a particular cluster.
For example, if you want to run Matlab, you'll want to do the
following. Notice that Matlab is not available until after the
module load
command has been run.
f20-l1:~: matlab
matlab: Command not found.
f20-l1:~: module whatis matlab
matlab : Matlab 2014b
f20-l1:~: module load matlab
f20-l1:~: matlab
< M A T L A B >
Copyright 1984-2014 The MathWorks, Inc.
R2014b (8.4.0.150421) 64-bit (glnxa64)
September 15, 2014
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
>>
|
If you are running a bourne or bash script under
sbatch but
your default shell is csh or tcsh , remember that
you must include a . ~/.bashrc (on the Zaratan cluster)
or a . ~/.bash_profile (on the Juggernaut cluster)
in your script to enable the module load and/or tap commands.
|
For more information, see the section on the module command.
It is strongly recommended that your dot files (e.g. .cshrc
and/or .bashrc
) do not produce any terminal
output, or at least, only do so when run in an interactive shell. Output
in non-interactive shells can cause problems, most notably with some file
transfer protocols like scp
and sftp
--- in most
cases, the stray output will confuse the file transfer protocol usually
causing it to abort.
|
If commands in your
.cshrc , .bashrc , or
other dot files might produce output to the terminal, you should take measures
to ensure such output is only produced in interactive shells. Otherwise, you
might break file transfer protocols like sftp or scp .
|
If you have commands in your dot files which might produce output, you
should consider running them only in interactive shells so as not to confuse
file transfer protocols. The method varies by the type of shell; for
csh
and/or tcsh
style shells, something like:
if ( $?prompt ) then
#Only execute the following in interactive shells
echo "Hello, today is "
date
endif
#Here we redirect output (if any) to /dev/null
some_command >& /dev/null
For Bourne-like shells (e.g. sh
and/or bash
)
something like:
if [ ! "x$PS1" = "x" ]; then
#Only execute the following in interactive shells
echo "Hello, today is "
date
fi
#Here we redirect output (if any) to /dev/null
some_command > /dev/null 2> /dev/null
You can also redirect output to /dev/null
, as is done
in the case of some_command
in the above examples. Be sure
to remember to redirect both stdout
and stderr
.
In some cases, the command has options to silent it, which can also be
used (e.g. the old tap
has a -q
option), but this
generally will still get output from errors.
On each cluster, you have several options available to you regarding where files are stored. A whole section is devoted to this on another page, but it is important and basic enough that a short discussion is merited here.
With access to the cluster, you are granted a home directory, which is the directory you see when you first log in. It is distinct from your standard TerpConnect/Glue home directory. Also, if you have access to more than one HPC cluster, your home directory on each is distinct from the other(s).
Your home directory is, by default, private to you, and should be used as little as possible for data storage. In particular, you should not run jobs out of your home directory --- run your jobs from a scratch filesystem; these are optimized to provide better read and write performance to improve the speed of your job. After the job is finished, you might wish to copy the more critical results files back to your home directory, which gets backed up nightly. (The scratch filesystems are NOT backed up.)
You should run jobs out of a scratch filesystem. On Zaratan, you have two choices of where to locate your data. The first space is shared with the rest of the users in your project, the second is private to you. All users will have at least these two spaces, and users that are part of more than one project may have additional spaces.
/scratch/zt1/project/PROJECT/user/USERNAME
/scratch/zt1/project/PROJECT/shared
where USERNAME is your Zaratan username and
PROJECT is your Zaratan project name. A link in your
home directory has been provided to give you easy access to your
private scratch space. You can access it via the ~/scratch
symbolic link (which just provides a different name with which you
can access the contents of the directory).
If you are a member of multiple projects, you'll have multiple links
of the form ~/scratch.PROJECT
.
In addition to your home and scratch spaces, you also have SHELL
spaces. These spaces are intended for medium-term storage of data,
and are part of a networked filesystem that is also made available
to machines outside the cluster. You may install a client on your
local workstation or laptop that will provide you direct access to
these space.
More information on how to access your SHELL data remotely.
The SHELL filesystem is not optimized for high performance, and for that
and other reasons it is not mounted on the compute nodes, so Your SHELL
directories are not accessible from within jobs. For each project
you belong to, You normally will have symbolic link
~/SHELL.PROJECT
pointing to your
personal subfolder of PROJECT's SHELL tree.
All users should belong to at at least one project, and so will have at least a personal scratch directory and personal SHELL directory for the project.
/afs/shell.umd.edu/project/PROJECT/user/USERNAME
/afs/shell.umd.edu/project/PROJECT/shared
Before long, you will need to transfer files to the HPC clusters from your computer, or from one of the HPC clusters to your computer, or otherwise move data to/from one of the HPC clusters to another location. There are several options for this.
The "standard" unix-like utilities for transferring data between nodes are
the scp
and sftp
programs, which implement the
Secure Copy (SCP) and Secure File Transfer Protocol (SFTP), respectively.
These are typically preinstalled on Unix-like systems. Windows or Mac users
might need to install a scp/sftp client
on their machine.
If you are transferring between clusters, generally both sides have the server processes running, and you can initiate the transfer from either side. When dealing with your workstation, it most likely does NOT have the server running, so you regardless of which way you wish to transfer data, you will likely want to initiate the transfer from your workstation.
Assuming you are on your workstation and your have the client installed, just open up the client and point it to the Zaratan login nodes as described in the section on logging into the clusters.. I. e.,
unix-prompt> scp -o User=payerle myfile login.zaratan.umd.edu:
Password:
myfile 100% 867 0.9KB/s 00:00
unix-prompt>
unix-prompt> sftp -o User=payerle login.zaratan.umd.edu:
Connecting to login.zaratan.umd.edu...
Password:
sftp> put morefiles*
Uploading morefiles1 to /home/payerle/morefiles1
morefiles1 100% 132 0.1KB/s 00:00
Uploading morefiles2 to /home/payerle/morefiles2
morefiles2 100% 308 0.3KB/s 00:00
sftp> quit
unix-prompt>
The above example shows how to transfer files using scp and sftp for the user payerle; you will obviously need to replace that username with your own.
This will by default allow you to move files to and from your home directory. For larger data sizes (more than a few GB) you almost certainly wish to place them in scratch space or in your SHELL space.
Some more detailed information regarding the use of the scp command can be found in the section on basic Unix commands
For large amounts of data, you might wish to use Globus for transferring files. Globus can automatically user multiple streams (speeding up the transfers) and can automatically restart failed transfers (very useful when dealing with many GBs of data), and is supported by most HPC clusters.
You can also install the free Globus Connect Personal; see instructions here.
To use globus, go to the login page https://globus.org/login, and log in. You can select "University of Maryland College Park" as your organization and login with your University ID and password. The select your endpoints (you might need to provide a username and password for the selected endpoint) and start transferring files.
More detailed instructions re using Globus
The use of Cloud Storage providers for storing data is becoming increasingly popular. The University provides free access to
These Cloud Storage providers can be particularly useful for storing archival data. Please see the service catalog entries above for more information on the services, and in particular for a discussion of what data is permissible to store there.
Each Cloud storage provider backend has slightly different procedures for uploading and downloading data. However, the rclone utility is a nice utility that can access many different Cloud service providers (including both Google Drive and Box) with a common command line interface.
More information about using rclone can be found here.
Compiling codes in an HPC environment can be tricky, as they often involve MPI and other external libraries. Furthermore, the various HPC clusters offer multiple compiler familiess, and multiple versions of the compilers within each family, and multiple versions of many libraries. All of this can make this compilation even more complicated.
We have simplified this a bit with the introduction of "toolchains". These are collections of compilers and related libraries. The various toolchains are:
Warning: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors
, which is not encouraging.) It includes
Intel MPI and MKL. Note: It is expected that at some date
in the near future Intel will be dropping support for the legacy compilers.
Once you decide which toolchain you want to use, you should
module load
it. You can module load the entire toolchain,
or individually module load the various components it includes (the
toolchain module does not do anything special, it is just a shortcut
for loading a bunch of modules).
The following compiler commands are available on the Zaratan HPC cluster:
Compiler Family | MPI library | C compiler | C++ compiler | Fortran77 compiler | Fortran90 compiler |
---|---|---|---|---|---|
GNU | none | gcc | g++ | gfortran | gfortran |
GNU | OpenMPI | mpicc | mpic++ | mpifort | mpifort |
GNU | Intel MPI | *** Illegal combination *** | |||
Intel (legacy) | none | icc | icpc | ifort | ifort |
Intel | Intel MPI | mpiicc | mpiicc | mpiifort (NOTE the doubled i ) |
mpiifort (NOTE the doubled i ) |
Intel (new) | none | icx | ipcx | ifx | ifx |
Intel | Intel MPI | mpiicc | mpiicc | mpiifort (NOTE the doubled i ) |
mpiifort (NOTE the doubled i ) |
If you have any external libraries you need to use, you need to module load or tap these as well. Some libraries have specific versions compiled with and for a specific compiler/MPI library combination; in such cases you need to pick a version which matches what you are using. Not all combinations exist; if yours does not you can submit a help ticket requesting that combination. We generally try to avoid doing this for old versions of compilers/packages/etc. unless there are extraordinary reasons, so you are generally advised to try to use the latest versions available on the system. Fortran90 codes are particularly sensitive to this, and the *.mod files between different versions of the same compiler might not be compatible (and are definitely not compatible across compilers).
For packages which are libraries used by other packages (e.g. LAPACK, NETCDF, etc), the module load command is not enough. The package being compiled needs to know where to find the libraries, and the way to inform the package of that depends on the build system used by the package.
Doing a module load or tap for an external library generally only defines some environmental variables for you, you still need to instruct the compiler where to find any needed include files and where to find the libraries. Generally, "module help MODULENAME" or "tap TAPNAME" will give a brief usage summary giving the names of these variables. Typically there is a variable with a name like "*INC" or "*INCDIR", and another one with a name like "*LIB" or "*LIBDIR". E.g. the netcdf package defines NETCDF_INCDIR and NETCDF_LIBDIR. The package fftw/3 on the Deepthought clusters defines FFTWINC and FFTWLIB.
The "INC" variable provides the location of the include directory for the package. You will generally want to either add arguemnts with these variables preceded by the -I flag to your compilation command. The -I flag takes a single path, so you should repeat it if you have multiple library packages that you are using.
login-1:~: gcc -c -I$NETCDF_INCDIR -I$FFTWINC my_netcdf_code.c
CFLAGS= -I$(NETCDF_INCDIR) -I$(FFTWINC)
FFLAGS= -I$(NETCDF_INCDIR) -I$(FFTWINC)
The "LIB" variables work similarly, except these are needed when the compiler/linker creates the executable. For small codes, the compilation and linking usually occur in a single step; larger codes, especially those with makefiles, usually break this into separate steps. Also, the dynamic linker needs to know where the libraries files are when you run the code, so it is usually easiest to set the "rpath" during compilation. (Otherwise you will need to set the environmental variable LD_LIBRARY_PATH each time before you run the code). To tell the compiler where to find the library for linking, you provide the "LIB" variables as arguments to the "-L" flag. To set the rpath, you provide them as arguments to the "-Wl,-rpath," flag. Both the -L and -Wl,-rpath, take a single path, so you should repeat these flags for each library package.
For the simple case, you would do something like
login-1:~: ifort -o mycode -I$NETCDF_INCDIR -I$FFTWINC
-L$NETCDF_LIBDIR -L$FFTWLIB \
-Wl,-rpath,$NETCDF_LIBDIR -Wl,-rpath,$FFTWLIB my_netcdf_code.f90
Since this compiles and links my_netcdf_code.f90 in a single step, we need to provide both the compile stage (-I) flags and the link stage (-L and -Wl,-rpath) flags.
More complicated cases typically use makefiles, and here you typically will just do something like:
CFLAGS= -I$(NETCDF_INCDIR) -I$(FFTWINC)
FFLAGS= -I$(NETCDF_INCDIR) -I$(FFTWINC)
LDFLAGS= -L$(NETCDF_LIBDIR) -Wl,-rpath,$(NETCDF_LIBDIR)
LDFLAGS+= -L$(FFTWLIB) -Wl,-rpath,$(FFTWLIB)
Here we have the CFLAGS and FFLAGS definitions from above (which will be used in the compilation stage), and we put the -L and -Wl,-rpath flags in the LDFLAGS variable (we did this in two steps to make it more readable).
If you opt not to go the "rpath" route, and instead compile the code with something like
CFLAGS= -I$(NETCDF_INCDIR) -I$(FFTWINC)
FFLAGS= -I$(NETCDF_INCDIR) -I$(FFTWINC)
LDFLAGS= -L$(NETCDF_LIBDIR) -L$(FFTWLIB)
(note that the -Wl,-rpath
arguments are missing in the above),
then before running the resulting binary (which we will call
myprog
) you will need to set LD_LIBRARY_PATH
appropriately. This (and the module loads which precede it) will need to
be done in every interactive or batch session which plans to run the code.
E.g.
login-1: echo $LD_LIBRARY_PATH
LD_LIBRARY_PATH: Undefined variable.
login-1: ./myprog
./myprog: error while loading shared libraries: libfftw.so.2: cannot open shared object file: No such file or directory
login-1: module load fftw/2.1.5
login-1: module load netcdf
login-1: setenv LD_LIBRARY_PATH "${FFTWLIB}:${NETCDF_LIBDIR}"
login-1: ./myprog
Program runs successfully
If you do NOT use the "rpath" arguments shown earlier, every
time you run the program the variable LD_LIBRARY_PATH
must
be properly defined to point to the library locations, or you will get
an error like shown above. In general, you will need to load the
modules and issue the setenv
command once in every interactive
login session in which you will use it, and once in every batch script.
And you MUST set the directories correctly; if you, e.g., give the
$FFTWLIB
path for a different version of FFTW than the one
the code was compiled with, the binary might run, but it might
crash with a difficult to debug error at some seemingly arbitrary place.
Or perhaps even worse, it might run to a seemingly successful conclusion but
produce erroneous output.
We strongly recommend that for your own code, or for code that you are
compiling, that you use the rpath
arguments shown earlier. The
LD_LIBRARY_PATH
option is, in our opinion, best used when
you do not have the option of compiling the code with the rpath
settings.
Since you are using a High Performance Computing cluster, you most likely have long, compute intensive jobs. These can generally benefit greatly from various optimization techniques. The general topic of code optimization is quite broad, far too large to give more than a cursory discussion here. You are encouraged to look at the myriad of resources on the topic that exist on the web. Here we just discuss some specific details related to the UMD clusters.
A major performance feature of modern processors is the ability to vectorize certain operations. In this mode, an instruction is decoded once and operates on a stream of data. This allows for significant performance boosts. There are various vectorization command sets available on Intel processors, known by names like SSE and AVX (with numerical suffixes to specify the version of the command sets). Selecting the correct optimization level can be tricky, as different processors support different vectorization command sets. This is further complicated by the mix of Intel and AMD processors on the cluster.
Currently the Zaratan cluster is fairly homogeneous, although we expect that this could change in the future as more nodes are added to the cluster, and consists of AMD Epyc chips with 128 cores per node which support AVX and AVX2, but not AVX512.
OpenMP is an API for shared memory parallelization, i.e. for spliting a code into multiple threads running on the same node. Because the parallelization is limited to a single node, it is less powerful than other APIs (e.g. MPI) which can span multiple nodes, but is also much easier to code for.
Indeed, OpenMP is implemented by the compiler, and is generally invoked through various compiler directives and/or pragmas. E.g., if you have a C code with a for loop, you can add a pragma just before the start of the for loop which will tell the compiler to try to split the loop into multiple threads each running in parallel on a different processor core.
If you have a code with OpenMP directives, you need to tell the compiler to compile it with the OpenMP extensions enabled. The exact mechanism is compiler dependent:
Compiler Family | Flag to use OpenMP | Default number of threads (if OMP_NUM_THREADS not set, etc) |
---|---|---|
GNU compiler suite | -fopenmp | number of available cores |
Intel Compiler Suite | -openmp | number of available cores |
PGI Compiler Suite | -mp | 1 thread |
NOTE: If you are using the Intel compiler suite and the
Math Kernel Libraries (MKL), some of the MKL routines might use OpenMP even
if you did not request OpenMP in the main compilation. You can set the
environmental variable OMP_NUM_THREADS
to 1 to effectively
disable that if really desired.
By default, OpenMP will attempt to use as many threads as there are cores
on the system (except with PGI cmopilers, which default to one thread).
This can be problematic in some cases. At runtime, you can
set the environmental variable OMP_NUM_THREADS
to an integer
to control the maximum number of threads that OpenMP will use.
NOTE: Be sure to use the -openmp
or
-fopenmp
flag (as appropriate for the compiler using) on all;
of the compilation AND link stages for your code.