Pre-trained Language Model Decoder Inference
Once an autoregressive language model has been optimized via Maximum Likelihood Estimation to find parameters , the pre-trained model, denoted as , can be used to compute the conditional probability for the next token at each position within a given sequence.
0
1
Tags
Foundations of Large Language Models
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Log-Likelihood Objective for Language Model Training
Formulating the MLE Objective for a Small Dataset
Total Loss Calculation for a Token Sequence
A model is being trained on a dataset containing just two sequences:
seq_1 = (x_0, x_1)andseq_2 = (y_0, y_1, y_2). According to the principle of maximum likelihood estimation for sequential data, which expression correctly represents the decomposed log-probability that the model aims to maximize for this entire dataset?When training a model on a sequence of data using the Maximum Likelihood Estimation objective, a single prediction with a very low conditional probability for one element in the sequence can have a disproportionately large negative impact on the total log-probability calculated for that entire sequence.
Pre-trained Language Model Decoder Inference