Skip to content

Zaratan Cluster

Zaratan Image

The current flagship HPC cluster at the University of Maryand is the Zaratan cluster. Named after a mythological sea turtle known for its long lifetime and gargantuan size, the Zaratan cluster went online in August 2022, and consists of:

Details of the Zaratan cluster:

Description Number of nodes Processor Cores/node Mem/node (GiB) Mem/core /tmp size (TB) GPUs/node
Standard compute 360 AMD Zen3 128 512 4 0.75 none
Large memory 6 AMD Zen3 128 2048 16 6 none
H100 8 Intel SapphireRapids 96 512 5.3 12 4 x H100 (80 GB)
A100 20 AMD Zen3 128 512 4 0.75 4 x A100

The standard compute, large memory, and A100 nodes have dual AMD EPYC 7763 Zen3 processers, with 64 cores per CPU and a base speed of 2.45 GHz (3.5 GHz turbo speed).

The H100 nodes have dual Intel Xeon Platinum 8468 SapphireRapids processors, with 48 cores per CPU and a base speed of 2.1 GHz (turbo speed of 3/8 GHz).

Each NVIDIA A100 Tensor Core GPU has 40 GiB of GPU RAM (using the Ampere architecture supporting CUDA compute capability 8.0). These are SXM models of the GPUs which support NVLink. As indicated by the name, each fractional a100_1g.5gb multi-instance GPU has 5 GiB of GPU RAM; the CUDA compute capability of 8.0 is not changed.

Each NVIDIA H100 Tensor Core GPUs has 80 GiB of GPU RAM (using the Ampere architecture supporting CUDA compute capability 9.0) These are SXM models which support NVLink.

The compute and large memory nodes have HDR100 interconnects. The GPU nodes have HDR interconnects.

Zaratan Partitions

Partition name Maximum Walltime Notes
standard 7 days All jobs w/out special requirements
debug 15 min Short test/debug jobs
bigmem 7 days Jobs needing large amounts of memory
gpu 7 days Jobs needing GPUs
scavenger 14 days Free, but low priority and preemptible

Zaratan Features/Constraints

The following features or constraints are defined on the Zaratan cluster and can be requested with the sbatch --constrain flag:

FeatureDescription
amdNode has AMD based CPUs
beeond Node supports BeeOND
epyc_7702 Node has AMD EPYC 7702 CPUs
epyc_7763 Node has AMD EPYC 7763 CPUs
epyc_9124 Node has AMD EPYC 9124 CPUs
ib Node supports Infiniband
intel Node has Intel based CPUs
noib Node does not have Infiniband
nvme Node has NVMe disks
rhel8 Node is running Red Hat Enterprise Linux version 8
xeon_6248 Node has Intel Xeon 6248 CPUs
xeon_8468 Node has Intel Xeon 8468 CPUs
xeon_8592 Node has Intel Xeon 8592 CPUs

Zaratan GRESes

GRES Description Number in cluster Hourly SU cost Cuda Compute Capability
gpu:h100 NVIDIA Hopper H100 GPU (80GB) 32 144 SU/hr 9.0
gpu:a100 NVIDIA Ampere A100 GPU (40GB) 76 48 SU/hr 8.0
gpu:a100_1g.5gb Fractional (1/7) A100 GPU (5GB) 28† 7 SU/hr 8.0

Note:: The number of physical A100 GPUs that are split into smaller virtual GPUs, and potentially the sizes of these smaller virtual GPUs, is subject to fluctation without advanced notice as we gauge how to best distribute the resources to meet user demand (and as the demand changes). The numbers listed are accurate as of the time this was written, and since there are currently 80 physical A100 GPUs on Zaratan, the total of the number of a100 GPUs plus 1/7 of the number a100_1g.5gb virtual GPUs will equal 80.