Parallelizing DL workloads on multiple GPUs: 14 February 2025

The Division of Information Technology (DIT) at the University of Maryland is pleased to announce the following workshop for users of our High Performance Computing (HPC) resources.

NOTE: Registration is currently closed as of 23 Jan 2025. All seats are full, and there is a waiting list. We are hoping to repeat this workshop in the fall.

Parallelizing DL workloads on multiple GPUs
Date	14 February 2025 (Section 1) 21 February 2025 (Sections 2 & 3)
Time	9:00 AM - 1:00 PM (on both days)
Location	Virtual. Contact information will be provided to registered attendees.
Instructor	TBD NVIDIA Deep Learning Institute
Cost	free
Registration Form	link to registration form
Application Deadline	13 February 2025 or when registration is full

~~Register Here~~

NOTE: Registration is currently closed as of 23 Jan 2025. All seats are full, and there is a waiting list.

Overview

This workshop is targeted at members of the UMD community, interested in accelerating DL training in multi-GPU environments, for instance on UMD's Zaratan cluster. It will discuss the effect of batch size as well as other considerations of training performance and accuracy for single and multiple GPU workloads using PyTorch and PyTorch Distributed Data Parallel.

This workshop is led by NVIDIA personnel for free in the context of the NVIDIA Deep Learning Institute and will cover the following topics:

Significance of stochastic gradient descent and effects of batch size
- Understand the issues with sequential single-thread data processing and the theory behind speeding up applications with parallel processing.
- Understand loss function, gradient descent, and stochastic gradient descent (SGD).
- Understand the effect of batch size on accuracy and training time with an eye towards its use on multi-GPU systems.
Key algorithmic considerations to retain accuracy when training on multiple GPUs
- Understand what might cause accuracy to decrease when parallelizing training on multiple GPUs.
- Learn and understand techniques for maintaining accuracy when scaling training to multiple GPUs.
Workshop assessment and final review
- Skills-based coding assessments to evaluate the participant's ability to train deep learning models on multiple GPUs.
- Review of key learnings, workshop survey.

Section 1 will be covered on February 14; Sections 2 & 3 will be covered on February 21, 2025.

Prerequisites

Basic knowledge of the Python programming language and the use of Jupyter Notebook is assumed; previous experience with deep learning training using PyTorch is beneficial.

System Requirements

As this workshop will be offered online, you are assumed to have a system (preferably a laptop or desktop) with the latest version of Chrome or Firefox installed.
Each participant will be provided with a NVIDIA cloud account and dedicated access to GPU-accelerated servers.

Miscellaneous Details

~~Register Here~~

While the registration deadline is 13 February 2025, we are limiting this workshop to 100 participants on a first-come, first-served basis, and registration will close once the workshop is full.

UPDATE: Registration is currently closed as of 23 Jan 2025. All seats are full, and there is a waiting list.

NOTE: DIT reserves the right to cancel the workshop for any reason with little notice.

If you have questions or need more information about the workshop, please feel free to contact us.