Learn Before
Efficiency vs. Learning Trade-off in Denoising
When using a denoising autoencoding objective, a common technique to improve computational efficiency is to replace a contiguous span of several words in the input text with a single placeholder token. Explain the trade-off involved in choosing the length of the text spans to replace. Specifically, what is the benefit of using longer spans, and what is a potential negative consequence for the model's learning if the spans become too long?
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A machine learning engineer is training a model to reconstruct a document from a corrupted version. They are considering two different strategies for creating the corrupted input:
- Strategy A: Replace 15% of the words in the document, chosen at random, each with a single
[MASK]token. - Strategy B: Replace three separate, contiguous spans of words (which together make up 15% of the document's total words) with a single
[SPAN]token for each span.
Assuming all other factors are equal, which strategy is likely to result in a more computationally efficient training process, and why?
- Strategy A: Replace 15% of the words in the document, chosen at random, each with a single
Optimizing Training Efficiency
Efficiency vs. Learning Trade-off in Denoising