Learn Before
Calculating a Global Value in a Distributed System
A team is training a large machine learning model on a dataset that is partitioned across 16 separate computational nodes. To monitor the model's performance, they need to calculate the average loss (a measure of error) over the entire dataset. After a processing step, each of the 16 nodes has computed the sum of the loss for its own partition of the data. How would you design the next computational step to determine the single, final average loss value for the entire dataset, using the partial sums available on each node?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
All-Reduce Algorithms
Collective Communication Toolkits
Calculating a Global Value in a Distributed System
A distributed system has four nodes, each processing a unique subset of a large dataset. Which of the following scenarios requires a collective operation to complete?
In a parallel processing system, a task is defined as a collective operation if, and only if, each computational node can complete its part of the task and arrive at the final global result independently, without communicating its intermediate results to any other node.
Distributed Computation of Weighted Value Sums