An input sequence of 200 tokens is processed during a model's self-supervised pre-training. The procedure first selects 15% of the tokens for modification. Of this selected group, 80% are replaced with a special mask symbol, 10% are replaced with a different, random token, and the final 10% are left as they are. Given this process, which statement accurately describes the state of the 200-token sequence after this modification step?
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An input sequence of 200 tokens is processed during a model's self-supervised pre-training. The procedure first selects 15% of the tokens for modification. Of this selected group, 80% are replaced with a special mask symbol, 10% are replaced with a different, random token, and the final 10% are left as they are. Given this process, which statement accurately describes the state of the 200-token sequence after this modification step?
A language model is pre-trained using a masked language modeling objective. Arrange the following stages of its data processing and training pipeline in the correct chronological order.
Analyzing a Pre-training Pipeline Implementation