Learn Before
Chain Rule of Probability for Auto-regressive Language Models
Auto-regressive language models calculate the probability of a text sequence, , by decomposing it into a product of conditional probabilities using the chain rule. The probability of each token is conditioned on all preceding tokens in the sequence. The general formula for a sequence is: For example, for a sequence of five tokens, this expands to:
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Chain Rule of Probability for Auto-regressive Language Models
Permuted Language Modeling (PLM)
A language model is being trained on the sentence: 'The quick brown fox jumps over the lazy dog.' The model's primary purpose is to generate new text by predicting the next word in a sequence based only on the words that came before it. When the model is calculating the representation for the word 'jumps' during this process, which part of the sentence is it allowed to consider?
Permuted Language Modeling
Model Architecture Suitability for Sentiment Analysis
Rationale for Auto-Regressive Model Design in Text Generation
Learn After
Probability Factorization for Arbitrary Order Token Prediction
Step-by-Step Example of Auto-Regressive Sequence Generation
Standard Auto-Regressive Probability Factorization using Embeddings
A language model is designed to calculate the likelihood of a text sequence by predicting each token based only on the tokens that have come before it. Given the three-token sequence 'The quick brown', which of the following expressions correctly represents how this model would calculate the total probability of the entire sequence?
Example of Auto-Regressive Probability Calculation
Calculating Sequence Probability in an Auto-regressive Model
Debugging a Sequence Probability Calculation