Learn Before
Formula

Average Cross-Entropy Loss for Sequence Modeling

To measure the quality of a language model and make performance comparable across documents of different lengths, we evaluate it using the cross-entropy loss averaged over all nn tokens in a sequence. The formula is given by: 1nt=1nlogP(xtxt1,,x1)\frac{1}{n} \sum_{t=1}^n -\log P(x_t \mid x_{t-1}, \ldots, x_1), where PP represents the conditional probability provided by the language model and xtx_t is the actual token observed at time step tt. A better model yields a lower average loss, which conceptually corresponds to spending fewer bits to compress the sequence.

0

1

Updated 2026-05-13

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L

Learn After