Learn Before
Formula
Average Cross-Entropy Loss for Sequence Modeling
To measure the quality of a language model and make performance comparable across documents of different lengths, we evaluate it using the cross-entropy loss averaged over all tokens in a sequence. The formula is given by: , where represents the conditional probability provided by the language model and is the actual token observed at time step . A better model yields a lower average loss, which conceptually corresponds to spending fewer bits to compress the sequence.
0
1
Updated 2026-05-13
Tags
D2L
Dive into Deep Learning @ D2L