Learn Before
Example of Span Masking
To illustrate span masking, consider an original sequence : 'The 0 puppies are frolicking outside the house .'. By applying this corruption method to produce , two specific spans are targeted. First, a zero-length span, indicated by '0', results in the insertion of a [MASK] token. Second, the contiguous span 'frolicking outside the' is entirely replaced by another single [MASK] token. The final corrupted sequence becomes: 'The [MASK] puppies are [MASK] house .'.
0
1
References
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Example of Span Masking
Prediction Challenge in Span Masking
An engineer is analyzing a text corruption technique and observes the following input-output pair:
Original Text: 'The very happy dog played in the big yard.' Processed Text: 'The [MASK] dog played [MASK] yard.'
Based on this single example, what can be definitively concluded about the rules of this corruption technique?
Reconstructing Masked Spans
Consider a text corruption technique where non-overlapping segments of text are randomly selected and each chosen segment is replaced by a single
[MASK]token. According to this technique, if the three-word segment 'the big dog' is selected from a sentence, it would be replaced by '[MASK] [MASK] [MASK]'.A text corruption technique involves selecting one or more non-overlapping segments of text (spans) and replacing each entire segment with a single
[MASK]token. This technique also allows for selecting zero-length spans, which results in inserting a[MASK]token. Given the original sentence: 'The quick brown fox jumps over the lazy dog.', which of the following outputs represents an invalid application of this technique?
Learn After
A text corruption technique involves selecting non-overlapping segments of text and replacing each segment with a single
[MASK]token. This technique also allows for selecting zero-length segments, which results in the insertion of a[MASK]token at that position. Given the original sentence 'The quick brown fox jumps over the lazy dog.' and two segments selected for corruption—the segment 'brown fox' and a zero-length segment between 'the' and 'lazy'—what is the resulting corrupted sentence?Deconstructing a Masked Sentence
Analyzing a Text Corruption Process