Learn Before
Case Study

Choosing a Parallelism Strategy for a Large Model

A deep learning team is tasked with training a new language model with 200 billion parameters. They have a cluster of GPUs, but each individual GPU has only 40GB of memory, which is not enough to store the entire model. The team proposes two potential training setups. Evaluate which setup is appropriate for this scenario and justify your reasoning by explaining why the chosen setup works and the other one fails.

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science