Learn Before
Aggregated Gradient Calculation
A machine learning model is being trained in a distributed environment using two workers. For a single parameter, worker 1 computes a gradient of +0.5 on its partition of the data, and worker 2 computes a gradient of -0.2 on its partition. What is the value of the final, aggregated gradient that will be used to update this parameter for the entire mini-batch?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
In a data-parallel distributed training setup with four workers, a mini-batch is split equally among them. For a particular training step, the gradient vectors calculated on three of the workers have a similar, small magnitude. However, the fourth worker calculates a gradient vector with a magnitude ten times larger than the others, possibly due to a corrupted data sample. According to the standard aggregation method for this setup, what is the most likely effect on the combined gradient used to update the model's parameters?
Aggregated Gradient Calculation
Aggregated Gradient Calculation