Skip to main content

Parallelizing DL workloads on multiple GPUs: 14 February 2025

The Division of Information Technology (DIT) at the University of Maryland is pleased to announce the following workshop for users of our High Performance Computing (HPC) resources.

Parallelizing DL workloads on multiple GPUs
Date14 February 2025 (Section 1)
21 February 2025 (Sections 2 & 3)
Time9:00 AM - 1:00 PM (on both days)
LocationVirtual. Contact information will be provided to registered attendees.
InstructorTBD
NVIDIA Deep Learning Institute
Costfree
Registration Formlink to registration form
Application
Deadline
13 February 2025
or when registration is full

Register Here

Overview

This workshop is targeted at members of the UMD community, interested in accelerating DL training in multi-GPU environments, for instance on UMD's Zaratan cluster. It will discuss the effect of batch size as well as other considerations of training performance and accuracy for single and multiple GPU workloads using PyTorch and PyTorch Distributed Data Parallel.

This workshop is led by NVIDIA personnel for free in the context of the NVIDIA Deep Learning Institute and will cover the following topics:

  1. Significance of stochastic gradient descent and effects of batch size
    • Understand the issues with sequential single-thread data processing and the theory behind speeding up applications with parallel processing.
    • Understand loss function, gradient descent, and stochastic gradient descent (SGD).
    • Understand the effect of batch size on accuracy and training time with an eye towards its use on multi-GPU systems.
  2. Key algorithmic considerations to retain accuracy when training on multiple GPUs
    • Understand what might cause accuracy to decrease when parallelizing training on multiple GPUs.
    • Learn and understand techniques for maintaining accuracy when scaling training to multiple GPUs.
  3. Workshop assessment and final review
    • Skills-based coding assessments to evaluate the participant's ability to train deep learning models on multiple GPUs.
    • Review of key learnings, workshop survey.
Section 1 will be covered on February 14; Sections 2 & 3 will be covered on February 21, 2025.

Prerequisites

Basic knowledge of the Python programming language and the use of Jupyter Notebook is assumed; previous experience with deep learning training using PyTorch is beneficial.

System Requirements

As this workshop will be offered online, you are assumed to have a system (preferably a laptop or desktop) with the latest version of Chrome or Firefox installed.
Each participant will be provided with a NVIDIA cloud account and dedicated access to GPU-accelerated servers.

Miscellaneous Details

Register Here

While the registration deadline is 13 February 2025, we are limiting this workshop to 200 participants on a first-come, first-served basis, and registration will close once the workshop is full.

NOTE: DIT reserves the right to cancel the workshop for any reason with little notice.

If you have questions or need more information about the workshop, please feel free to contact us.






Back to Top