Example Comparison of Token Masking and Token Deletion
The distinction between token masking and token deletion can be demonstrated using an original sequence, denoted as : The puppies are frolicking outside the house .. When applying Token Masking to create a noisy sequence , selected tokens such as 'frolicking' and 'the' are replaced with a special symbol, resulting in: The puppies are [MASK] outside [MASK] house .. Conversely, applying Token Deletion to form completely removes the selected tokens from the sequence (e.g., frolicking and the), yielding: The puppies are outside house ..
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Example Comparison of Token Masking and Token Deletion
A language model is being trained to reconstruct an original text sequence from a corrupted version. During one training step, the original input is 'The quick brown fox jumps over the lazy dog.' and the corrupted input given to the model is 'The quick fox over the lazy dog.'. Based on this example, which specific input corruption technique was applied?
Analysis of Input Corruption Impact
When applying the token deletion method to corrupt an input sequence for model training, the length of the resulting sequence is identical to the original sequence.
Example of Token Deletion in Denoising Autoencoding
Example Comparison of Token Masking and Token Deletion
Span Masking
A common technique to create a 'noisy' version of a text sequence for model training involves randomly selecting individual words and replacing each one with a special marker, such as
[MASK]. Given the original sentence: 'The quick brown fox jumps over the lazy dog.', which of the following options correctly demonstrates this specific technique?Identifying an Input Alteration Procedure
A data scientist is preparing text for a model training process. The goal is to corrupt the input by replacing individual words with a special
[MASK]marker, while keeping the total number of words (including the markers) the same as the original. Given the original sentence: 'The model must predict the original words from the altered input.', which of the following sentences correctly applies this specific technique?Example of BERT-style Input for Masked Language Modeling
Example Comparison of Token Masking and Token Deletion
Consider the following two text sequences:
Sequence A: 'The puppies are frolicking outside the house .' Sequence B: 'The puppies are [MASK] outside [MASK] house .'
In the context of preparing data to train a language model, what is the primary purpose of creating Sequence B from Sequence A?
In the context of preparing data for language model training, an original sentence is often intentionally corrupted. Match each type of text sequence with its corresponding example.
Analyzing Text Corruption Techniques
Learn After
An engineer is preparing text data for a model. The process involves taking an original sentence and creating a 'damaged' version by altering some of its words. The engineer observes that the damaged sentences are consistently shorter in length (fewer total words) than their original counterparts. Which of the following data alteration methods is the engineer most likely using?
Choosing a Data Corruption Strategy
Identifying Text Corruption Methods