Learn Before
Language Model
For natural language data composed of discrete tokens like words, sequence models are specifically called language models. They provide the capacity to evaluate the likelihood of sentences, sample new text sequences, and optimize for the most likely outputs. By applying the chain rule of probability, language modeling can be reduced to an autoregressive prediction problem, decomposing the joint density of a sequence into the product of conditional densities in a left-to-right fashion: . For discrete signals, the autoregressive model acts as a probabilistic classifier, outputting a full probability distribution over the vocabulary for the next word given the leftwards context.
1
1
Contributors are:
Who are from:
Tags
Data Science
D2L
Dive into Deep Learning @ D2L