The Zaratan High-Performance Computing (HPC) cluster is the University of Maryland's flagship HPC cluster, maintained by the Division of Information Technology and replacing the Deepthought2 cluster. Coming on-line in spring 2022, it features 360 compute nodes, each with dual AMD 7763 64-core CPUs. These CPUs are direct-liquid cooled to enable all of the approximately 50,000 CPU cores to run at full speed. There are also 20 GPU nodes, each containing four Nvidia A100 GPUs (for a total of 80 GPUs). Theoretical peak performance is 3.5 PFLOPS .
The cluster has HDR-100 (100 Gbit) Infiniband interconnects between the nodes, with storage and service nodes connected with full HDR (200 Gbit). The cluster is connected with 200 Gbit Ethernet to various national networks.
The cluster provides 2 PB of high-performance parallel file storage (using BeeGFS), and 10 PB of more archival storage (using Auristor).
The following table lists the hardware on the Zaratan cluster:
Description | Processor | Number of nodes |
Cores/node | Total cores | Memory/node GB |
Memory/coreGB | Scratch space per node, GB |
GPUs/node | Interconnect | Comments |
---|---|---|---|---|---|---|---|---|---|---|
Standard compute | AMD EPYC 7763, 2.45 GHz base (3.5 GHz turbo) | 360 | 128 | 46080 | 512 | 4 | 750 | 0 | HDR-100 | DLC of CPUs |
A100 GPU Nodes | AMD EPYC 7763, 2.45 GHz base (3.5 GHz turbo) | 20 | 128 | 2560 | 512 | 4 | 750 | 4 NVidia A100 | HDR-100 | |
Serial Nodes | AMD EPYC 7502, 2.5 GHz base (3.5 GHz turbo) | 19 | 64 | 1216 | 1024 | 32 | 12,000 | 0 | GigE |
The nodes containing GPUs have quad Nvidia A100 Tensor Core GPUs (using the Ampere architecture supporting CUDA compute capability 8.0).
The cluster has 2 PB of high performance short term file storage (using BeeGFS) as well as 10 PB of longer term storage (using Auristor).
The standard compute nodes are connected with HDR-100 (100Gb/s) infiniband interconnects, and the GPU nodes have full HDR (200 Gb/s) infiniband.
The theoretical peak performance is 3.5 Pflops . The theoretical peak performance assume ideal conditions, in which the calculations are able to keep all the CPUs and GPUs fully utilized, which of course does not happen in practice. But these numbers are easy to compute and useful for rough comparisons. High-end laptops in 2022 (e.g. MacBook M1 Max with 24 or 32 core GPUs) have theoretical peaks of 5-10 Tflops, so Zaratan should be about 350 to 700 times faster.