1Cademy - Language Model

Learn Before

Concept

Language Model

For natural language data composed of discrete tokens like words, sequence models are specifically called language models. They provide the capacity to evaluate the likelihood of sentences, sample new text sequences, and optimize for the most likely outputs. By applying the chain rule of probability, language modeling can be reduced to an autoregressive prediction problem, decomposing the joint density of a sequence $P(x_1, \ldots, x_T)$ into the product of conditional densities in a left-to-right fashion: P(x_1, ldots, x_T) = P(x_1) prod_{t=2}^T P(x_t mid x_{t-1}, ldots, x_1). For discrete signals, the autoregressive model acts as a probabilistic classifier, outputting a full probability distribution over the vocabulary for the next word given the leftwards context.