Learn Before
A language model is being trained to fill in a masked word. For the input 'The cat sat on the [MASK]', the correct word is 'mat'. The training objective is to adjust the model to minimize the cross-entropy loss for its predictions. Below are four different potential outputs from the model, showing the probability it assigns to the word 'mat'. Which of these outputs would result in the LOWEST loss for this specific training example?
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.4 Alignment - Foundations of Large Language Models
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Probability of a True Token in MLM
Predicted Probability Distribution in MLM
Example of MLM Training Objective with Multiple Masks
MLM Loss Function as Negative Log-Likelihood
A language model is being trained to fill in a masked word. For the input 'The cat sat on the [MASK]', the correct word is 'mat'. The training objective is to adjust the model to minimize the cross-entropy loss for its predictions. Below are four different potential outputs from the model, showing the probability it assigns to the word 'mat'. Which of these outputs would result in the LOWEST loss for this specific training example?
Evaluating Model Performance via Cross-Entropy Loss
According to the standard Masked Language Modeling (MLM) training objective, a model's parameters are adjusted based on the cross-entropy loss calculated for a single, strategically chosen masked token within a training batch, aiming to optimize performance on that specific prediction.