1Cademy - Total Loss Calculation for a Token Sequence

Prediction A: The model assigns a probability of 0.75 to the correct word, &#x27;brightly&#x27;.
Prediction B: The model assigns a probability of 0.15 to the correct word, &#x27;brightly&#x27;.

Learn Before

Formula

Total Loss Calculation for a Token Sequence

The total loss for a given sequence of $m$ tokens $(x_0, \dots, x_m)$ is computed by summing the individual losses over each position from $i=0$ to $m-1$ . At each position $i$ , a loss function $\mathcal{L}$ measures the discrepancy between the model's predicted probability distribution for the next token ( $\mathbf{p}_{i+1}^{\theta}$ ) and the actual ground-truth distribution ( $\mathbf{p}_{i+1}^{\mathrm{gold}}$ ). This is expressed generally as: