Formulating the MLE Objective for a Small Dataset
A model is being trained on a small dataset consisting of just two sequences of tokens: Sequence A = (x₁, x₂, x₃) and Sequence B = (y₁, y₂). The training process aims to find model parameters that maximize the total log-probability of this dataset. Write out the specific mathematical expression that represents this total log-probability, decomposed into a sum of individual conditional log-probabilities.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Log-Likelihood Objective for Language Model Training
Formulating the MLE Objective for a Small Dataset
Total Loss Calculation for a Token Sequence
A model is being trained on a dataset containing just two sequences:
seq_1 = (x_0, x_1)andseq_2 = (y_0, y_1, y_2). According to the principle of maximum likelihood estimation for sequential data, which expression correctly represents the decomposed log-probability that the model aims to maximize for this entire dataset?When training a model on a sequence of data using the Maximum Likelihood Estimation objective, a single prediction with a very low conditional probability for one element in the sequence can have a disproportionately large negative impact on the total log-probability calculated for that entire sequence.
Pre-trained Language Model Decoder Inference