1Cademy - Ground-Truth Distribution as a One-Hot Representation

Learn Before

Total Loss Calculation for a Token Sequence

Definition

Ground-Truth Distribution as a One-Hot Representation

In language modeling, the ground-truth distribution at a given position, denoted as $\mathbf{p}_{i+1}^{\mathrm{gold}}$ , is defined as the one-hot representation of the actual next token, $x_{i+1}$ . This one-hot vector acts as the exact target for the model's prediction at that step.