Learn Before
Optimizing Training Efficiency
Based on the following scenario, propose a specific change to the team's input corruption strategy that would likely increase computational efficiency. Explain the principle that makes your proposed change effective.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A machine learning engineer is training a model to reconstruct a document from a corrupted version. They are considering two different strategies for creating the corrupted input:
- Strategy A: Replace 15% of the words in the document, chosen at random, each with a single
[MASK]token. - Strategy B: Replace three separate, contiguous spans of words (which together make up 15% of the document's total words) with a single
[SPAN]token for each span.
Assuming all other factors are equal, which strategy is likely to result in a more computationally efficient training process, and why?
- Strategy A: Replace 15% of the words in the document, chosen at random, each with a single
Optimizing Training Efficiency
Efficiency vs. Learning Trade-off in Denoising