Analyzing Text Corruption Techniques
A data scientist is preparing a sentence to be used as a training example for a language model. They begin with the original, uncorrupted sentence:
The puppies are frolicking outside the house .
They then create the following modified version:
The puppies are [MASK] outside house .
Based on this information, identify the two distinct text modification techniques that were applied to the original sentence to create the modified version.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Example of BERT-style Input for Masked Language Modeling
Example Comparison of Token Masking and Token Deletion
Consider the following two text sequences:
Sequence A: 'The puppies are frolicking outside the house .' Sequence B: 'The puppies are [MASK] outside [MASK] house .'
In the context of preparing data to train a language model, what is the primary purpose of creating Sequence B from Sequence A?
In the context of preparing data for language model training, an original sentence is often intentionally corrupted. Match each type of text sequence with its corresponding example.
Analyzing Text Corruption Techniques