Calculating Contribution to MLM Training Objective
A language model is being trained on the original sentence 'The quick brown fox.' During one training step, the model receives the masked input 'The quick [MASK] fox.' and produces the following probability distribution for the masked position:
- P('brown' | 'The quick [MASK] fox') = 0.7
- P('red' | 'The quick [MASK] fox') = 0.2
- P('lazy' | 'The quick [MASK] fox') = 0.1
Based on the maximum likelihood estimation objective used in this type of training, calculate the specific log-probability value that this single masked token contributes to the total objective function. Explain the significance of this calculated value for the model's training process.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
MLM Training Objective using Cross-Entropy Loss
In the context of training a language model, the objective is often to find parameters that maximize the likelihood of the training data. Consider the following mathematical expression for this objective:
Objective = ∑_{x ∈ D} ∑_{i ∈ A(x)} log Pr(xᵢ | x̄)Here,
Dis the dataset,xis an original text sequence,x̄is a version ofxwith some tokens masked,A(x)is the set of indices that were masked inx, andxᵢis the original token at a masked positioni.What does the inner summation,
∑_{i ∈ A(x)} log Pr(xᵢ | x̄), represent in this training process?Calculating Contribution to MLM Training Objective
A language model is being trained with the objective of maximizing the log-probability of the original tokens at masked positions. For the original sentence 'The fox jumps over the dog', the model is given the masked input 'The fox [MASK] over the dog'. Which of the following model predictions for the
[MASK]token would contribute the most to achieving the training objective for this specific instance?Example of Masked Language Modeling Loss Calculation