Case Study

Evaluating a Pre-training Data Corruption Step

A language model's pre-training pipeline involves selecting 15% of tokens in a sequence and then applying an 80/10/10 rule to only those selected tokens: 80% are replaced with a special [MASK] token, 10% are replaced with a different random token, and 10% are left unchanged. Given the following case, evaluate the correctness of the output. Is it a valid transformation according to the standard procedure? Explain your reasoning.

0

1

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science