Skip to main content
The Zaratan cluster

The Zaratan HPC cluster

Zaratan (zar-uh-tahn)
  1. n. A mythological sea turtle known for its long lifetime and gargantuan size.
  2. n. The University of Maryland's flagship HPC cluster , debuting in 2022, and described in more detail below.

The Zaratan High-Performance Computing (HPC) cluster is the University of Maryland's flagship HPC cluster, maintained by the Division of Information Technology and replacing the Deepthought2 cluster. Coming on-line in spring 2022, it features 360 compute nodes, each with dual AMD 7763 64-core CPUs. These CPUs are direct-liquid cooled to enable all of the approximately 50,000 CPU cores to run at full speed. There are also 20 GPU nodes, each containing four Nvidia A100 GPUs (for a total of 80 GPUs). Theoretical peak performance is 3.5 PFLOPS .

The cluster has HDR-100 (100 Gbit) Infiniband interconnects between the nodes, with storage and service nodes connected with full HDR (200 Gbit). The cluster is connected with 200 Gbit Ethernet to various national networks.

The cluster provides 2 PB of high-performance parallel file storage (using BeeGFS), and 10 PB of more archival storage (using Auristor).

Hardware

The following table lists the hardware on the Zaratan cluster:

Description Processor Number
of nodes
Cores/node Total cores Memory/node
GiB
Memory/core
GiB
Node local (/tmp)
per node, GB
GPUs/node Interconnect Comments
Standard compute AMD EPYC 7763, 2.45 GHz base (3.5 GHz turbo) 360 128 46080 512 4 750 0 HDR-100 DLC of CPUs
A100 GPU Nodes AMD EPYC 7763, 2.45 GHz base (3.5 GHz turbo) 20 128 2560 512 4 750 4 NVidia A100 HDR-100  
H100 GPU Nodes Intel Xeon Platinum 8468 8 96 768 512 5.3 12 4 NVidia H100 HDR-100  
Large Memory Nodes AMD EPYC 7763, 2.45 GHz base (3.5 GHz turbo) 6 128 768 2048 16 6 0 HDR-100 Paritition bigmem

The nodes containing GPUs have either quad Nvidia A100 Tensor Core GPUs with 40 GB of GPU RAM (using the Ampere architecture supporting CUDA compute capability 8.0) or quad Nvidia H100 Tensor Core GPUs with 80 GB of GPU RAM (using the Hopper architecture supporting CUDA compute capability 9.0).

The cluster has 2 PB of high performance short term file storage (using BeeGFS) as well as 10 PB of longer term storage (using Auristor).

The standard compute nodes are connected with HDR-100 (100Gb/s) infiniband interconnects, and the GPU nodes have full HDR (200 Gb/s) infiniband.

The theoretical peak performance is 3.5 Pflops . The theoretical peak performance assume ideal conditions, in which the calculations are able to keep all the CPUs and GPUs fully utilized, which of course does not happen in practice. But these numbers are easy to compute and useful for rough comparisons. High-end laptops in 2022 (e.g. MacBook M1 Max with 24 or 32 core GPUs) have theoretical peaks of 5-10 Tflops, so Zaratan should be about 350 to 700 times faster.






Back to Top