Learn Before
Analysis of Input Corruption Impact
Consider two different methods for corrupting an input sentence for a language model's training. Method 1 replaces certain words with a generic placeholder symbol, keeping the sentence length the same. Method 2 completely removes certain words, resulting in a shorter sentence. Analyze the unique challenge that Method 2 presents to a model in learning the grammatical structure of a language, compared to Method 1.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Example Comparison of Token Masking and Token Deletion
A language model is being trained to reconstruct an original text sequence from a corrupted version. During one training step, the original input is 'The quick brown fox jumps over the lazy dog.' and the corrupted input given to the model is 'The quick fox over the lazy dog.'. Based on this example, which specific input corruption technique was applied?
Analysis of Input Corruption Impact
When applying the token deletion method to corrupt an input sequence for model training, the length of the resulting sequence is identical to the original sequence.
Example of Token Deletion in Denoising Autoencoding