Case Study

Calculating a Global Value in a Distributed System

A team is training a large machine learning model on a dataset that is partitioned across 16 separate computational nodes. To monitor the model's performance, they need to calculate the average loss (a measure of error) over the entire dataset. After a processing step, each of the 16 nodes has computed the sum of the loss for its own partition of the data. How would you design the next computational step to determine the single, final average loss value for the entire dataset, using the partial sums available on each node?

0

1

Updated 2025-10-01

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science