1Cademy - Calculating a Global Value in a Distributed System

Learn Before

Collective Operation in Parallel Processing

Case Study

Calculating a Global Value in a Distributed System

A team is training a large machine learning model on a dataset that is partitioned across 16 separate computational nodes. To monitor the model's performance, they need to calculate the average loss (a measure of error) over the entire dataset. After a processing step, each of the 16 nodes has computed the sum of the loss for its own partition of the data. How would you design the next computational step to determine the single, final average loss value for the entire dataset, using the partial sums available on each node?

Updated 2025-10-01

Contributors are:

Who are from:

Learn Before

Related