1Cademy - Imagine two language models are tasked with completing the sentence: The weather today is exceptionally.... At this specific step, they must choose the very next word. Their internal calculations produce the following probability scores for the top three candidate words: * **Model 1:** `warm` (0.6), `sunny` (0.3), `bright` (0.1) * **Model 2:** `warm` (0.2), `sunny` (0.7), `bright` (0.1) If a system combines these models by averaging their token-level probability distributions to make a decision, which word will it select as the next word in the sequence, and why?

Learn Before

Model Averaging for Token-Level Prediction

Multiple Choice

Imagine two language models are tasked with completing the sentence: 'The weather today is exceptionally...'. At this specific step, they must choose the very next word. Their internal calculations produce the following probability scores for the top three candidate words:

Model 1: warm (0.6), sunny (0.3), bright (0.1)
Model 2: warm (0.2), sunny (0.7), bright (0.1)

If a system combines these models by averaging their token-level probability distributions to make a decision, which word will it select as the next word in the sequence, and why?

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related