Short Answer

Comparing Gradient Calculation Methods

Consider two scenarios for updating a model's parameters: one using the gradient calculated from a single, small subset of the training data, and the other using the gradient calculated from the entire training dataset. Explain the fundamental difference in the information provided by these two gradients and justify why, despite this difference, using the gradient from the small subset is a standard and effective practice in training large models.

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science