Learn Before
Deep Learning Approach to Language Modeling
In the era of deep learning, a typical approach to language modeling is to estimate token probabilities using a deep neural network. Neural networks trained to accomplish this task receive a sequence of context tokens, , as input and produce a distribution over the vocabulary , which is denoted by . The probability of the specific token , denoted as , is the value of the -th entry of this output distribution.
0
1
References
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Types of Language Models
Evaluating language models
Shannon's Foundational Work on Language Modeling
Generalization of the Language Modeling Concept
Chain Rule for Sequence Probability
Deep Learning Approach to Language Modeling
Output Token Sequence in LLMs
Start of Sentence (SOS) Token
[CLS] Token as a Start Symbol
A system is designed to predict the probability of a sequence of words. For the sequence 'The dog ran', the system provides the following conditional probabilities:
- The probability of 'The' occurring at the start of a sequence is 0.2.
- The probability of 'dog' occurring after 'The' is 0.3.
- The probability of 'ran' occurring after 'The dog' is 0.7.
Based on the fundamental principle used by such systems to determine the likelihood of a full sequence, what is the overall probability of the sequence 'The dog ran'?
Analyzing Language Model Probability Assignments
A system's primary goal is to predict the probability of a sequence of tokens. To calculate the total probability for the sequence 'The quick brown fox', it breaks the problem down into a series of conditional probability calculations. Arrange the following calculations in the correct order that the system would use to find the total probability of the sequence.
Evaluating a Language Model's Probabilistic Output
Learn After
Probability Distribution Formula for an Encoder-Softmax Language Model
Auto-Regressive Generation Process
Formal Definition of LLM Inference
Model Parameterization by θ
A language model built with a deep neural network is given the input sequence 'The cat sat on the'. The model's vocabulary consists of the following tokens: {a, cat, hat, mat, on, sat, the}. What does the model produce as its immediate, direct output to predict the very next token?
Analyzing Language Model Outputs
Explaining Language Model Output Behavior