Learn Before
Comparing Input Corruption Strategies
A language model is being trained with a denoising objective. Compare the learning pressures placed on the model when using a token alteration strategy (replacing a token with another from the vocabulary) versus a token masking strategy (replacing a token with a special [MASK] symbol). Specifically, how does each strategy influence what the model must learn from the surrounding context to reconstruct the original token?
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model developer is pre-training a model with the specific goal of improving its ability to identify and correct sentences containing incorrect word choices (e.g., distinguishing between 'your' and 'you're'). The model is trained to reconstruct the original, correct sentence from a deliberately damaged version. Which of the following input damage strategies would be most effective for this specific training objective?
Comparing Input Corruption Strategies
Evaluating Input Corruption Strategies for Typo Resilience