Learn Before
An engineer is training a model using mini-batches and notices that while the overall training loss is decreasing over many updates, the loss value for individual mini-batches fluctuates significantly—sometimes increasing from one batch to the next. Which statement best analyzes the fundamental reason for this behavior based on the properties of the mini-batch loss gradient?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Distributed Gradient Calculation
An engineer is training a model using mini-batches and notices that while the overall training loss is decreasing over many updates, the loss value for individual mini-batches fluctuates significantly—sometimes increasing from one batch to the next. Which statement best analyzes the fundamental reason for this behavior based on the properties of the mini-batch loss gradient?
Analyzing Gradient Magnitude
Comparing Gradient Calculation Methods