Rationale for Mixed Corruption Strategies in Pre-training
A language model is being pre-trained using a denoising objective. Instead of consistently using a single method to corrupt the input text (e.g., always masking tokens), the training process is configured to randomly apply one of several different corruption methods (masking, token replacement, or reordering) to each training example. Analyze the primary advantage of this mixed-method approach compared to relying on only one type of corruption throughout the entire pre-training phase.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A research team aims to pre-train a language model to be highly robust against a wide variety of real-world text errors, including typos, missing words, and jumbled phrases. Which of the following input corruption strategies during pre-training is most likely to achieve this goal of general robustness?
Rationale for Mixed Corruption Strategies in Pre-training
Evaluating a Pre-training Strategy for a Code Generation Model