The Deepthought2 cluster

News for UMD High Performance Computing

2021 Feb 05
Due to a cooling failure in the data center housing Deepthought2, the cluster had been shut down for protection from about 8:10 AM to about 9:30 AM. We apologize for any inconvenience.
2020 Nov 10
Part 2 of the virtual MATLAB workshop (scheduled for 11/17/20) has been postponed due to technical problems. When the problems are solved, we will inform the HPC users about the new date (latest in Jan.2021).
2020 Oct 30
The "rolling upgrade" of the Deepthought2 cluster from RHEL6 to RHEL8 is going ahead full steam. About 60% of the compute nodes have already been upgraded. As of the morning of Friday, 30 Oct 2020, we are switching the default behaviors to favor RHEL8. In particular,
  • ssh login.deepthought2.umd.edu will now put you on a RHEL8 login node. For now, you can still use ssh rhel6.deepthought2.umd.edu to access a RHEL6 login node (and ssh rhel8.deepthought2.umd.edu will still put you on a RHEL8 node).
  • You can still give a --constraint=rhel6 or --constraint=rhel8 flag to sbatch, etc. to specify that you want your job to go to RHEL6 or RHEL8 compute nodes, but now jobs not explicitly specifying an OS version will default to going to RHEL8 nodes.
WARNING

Note: The host key for login.deepthought2.umd.edu has changed. You likely will see a scary looking message when attempting to ssh stating that the "REMOTE HOST IDENTIFICATION HAS CHANGED!". You should never ignore such messages, but in this case it is likely not an issue. Please see the list valid key fingerprints for login nodes, and compare that the fingerprint in the ssh error message matches. If so, it is safe to remove the "offending" key listed in the ssh error (i.e. the error should state something like Offending RSA key in /home/payerle/.ssh/known_hosts:5.

You can remove the offending key with the command ssh-keygen -f ~/.ssh/known_hosts -R login.deepthought2.umd.edu or just open the file and delete the line indicated after the colon (:) in the error message (5 in this case); the line should begin with login.deepthought2.umd.edu.

You can then ssh again. You will likely get a slightly less scary warning from ssh, this time saying it cannot establish the authenticity of the host, and again give a host key fingerprint. If it is the same fingerprint as before or another valid fingerprint in the list valid key fingerprints for login nodes, you can just respond yes and ssh will continue to the login node (and will record the fingerprint for future reference).

See this section for more information on the RHEL6 to RHEL8 upgrade.

2020 Oct 22
DIT is pleased to offer a virtual MATLAB workshop on November 10 (part 1) and November 17(part 2). Part 1 will address code parallelization in general and is open to all MATLAB users at UMD. Part 2 will focus on using MATLAB Parallel Server on our HPC clusters. For details and registration see Parallel Computing with MATLAB Workshop 2020.
2020 Sep 1
We are starting a rolling upgrade of the Deepthought2 cluster's operating system, going from the rather old RHEL6 to RHEL8. At this time, one login node and about a dozen compute nodes are running RHEL8. We are hoping to have about half of the nodes running RHEL8 by 15 Oct, and about 90% by 15 Nov. The goal is to have the entire cluster running RHEL8 by 1 Jan 2021.
  • You can ssh to rhel8.deepthought2.umd.edu to go to a RHEL8 login node. The hostname rhel6.deepthought2.umd.edu will point to a RHEL6 login node. Currently login.deepthought2.umd.edu will also point to an RHEL6 node, but this will change to point to an RHEL8 login node at some point (probably late Oct/early Nov).
  • Both the RHEL6 and the RHEL8 nodes (compute and login) will share a home directory and lustre directories, so you can access your same files from either.
  • Both the RHEL6 and the RHEL8 nodes will share the same scheduler. So you can submit jobs for either set of nodes from either RHEL6 or RHEL8 login nodes. You can also monitor jobs from any of the login nodes.
  • Your allocations will have access to both RHEL6 and RHEL8 nodes. SUs are shared across the nodes. So if you have a 100 kSU allocation, you can do any of the following (but not all of the following):
    • Submit 100 kSU worth of jobs to RHEL6 nodes.
    • Submit 100 kSU worth of jobs to RHEL8 nodes.
    • Submit 50 kSU worth of jobs to RHEL6 nodes, and 50 kSU to RHEL8 nodes
    • Submit jobs to both RHEL6 and RHEL8 nodes, such that the sum of the amount submitted RHEL6 and the sum of the jobs submitted to RHEL8 nodes adds up to 100 kSU
  • Software will likely be incompatible between the two OS levels, due to differences in system libraries. We have built a new software library for RHEL8, and will be continuing to update during this semester. Please contact system staff if a required package is missing. We are no longer adding new software or upgrading software on the RHEL6 side.
  • With respect to software you built, you will likely need to recompile it for it to work on the new system.
  • Because of the software incompatibilities, you will likely need to send a job specifically to RHEL6 or to RHEL8 nodes. To facilitate this, we have added "rhel6" and "rhel8" features to nodes to allow you to specify which type of nodes are desired. For now, jobs not specifying a constraint will default to using "rhel6". You should not rely on the default behavior as we will likely be changing it in late October/early November.
2020 July 15
DIT is pleased to offer a series of introductory workshops for new users of our clusters, especially those with limited HPC and/or Unix experience.
  1. "Introduction to High Performance Computing", offered on July 28 and Aug 25;
  2. "Introduction to Unix", offered on Aug 11, 2020 and Sep 8, 2020;
  3. "Introduction to Compiling Software", offered on Sep 22, 2020;
  4. "Introduction to Python", offered on Oct 6, 2020.
Please follow the links to the workshop pages for more information and links to registrations. The workshop series will be offered multiple times a year.
2020 June 24
Due to the overwhelming interest shown for the June 30 "Introduction to High Performance Computing" workshop, we have opened up to more dates: July 28 and Aug 25. Please see the main page for this workshop for more information and links to registration.
2020 June 10
DIT is pleased to announce our first ever "Introduction to High Performance Computing" Workshop to be held (on-line) on 10 June 2020. This is intended to introduce and orient new users of the Deepthought and Juggernaut clusters. More detail and registration information can be found on the main page for this workshop. We hope this will become a frequent offering.
2020 February 28
DIT is pleased to announce the addition of the OnDemand Web Portal for using the Deepthought2 and Juggernaut HPC clusters. It is hoped that this web portal will facilitate the use of the clusters by the UMD community, especially for those who are not comfortable with the Unix command line.
2020 January 13-16
Winter 2020 HPC Programming Bootcamp is to be held.

Historic Archives of news items