Identifying Text Corruption Methods
An original sentence was altered using two different methods to create the two corrupted versions shown below.
Original: 'The quick brown fox jumps over the lazy dog.' Version A: 'The quick [MASK] fox [MASK] over the lazy dog.' Version B: 'The quick fox over the lazy dog.'
For each version (A and B), identify the specific data corruption method used and briefly explain the primary evidence in the text that supports your identification.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An engineer is preparing text data for a model. The process involves taking an original sentence and creating a 'damaged' version by altering some of its words. The engineer observes that the damaged sentences are consistently shorter in length (fewer total words) than their original counterparts. Which of the following data alteration methods is the engineer most likely using?
Choosing a Data Corruption Strategy
Identifying Text Corruption Methods