Multiple Choice

A research team is training a large model across a heterogeneous cluster of computing devices from different manufacturers. They are using a low-precision 8-bit numerical format to accelerate the process. They observe that when they run the exact same training job with the same initial random seed, the final model parameters diverge slightly depending on which specific set of devices was allocated for the run. The training does not crash, and no error messages are generated. What is the most probable cause for this observed divergence?

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science