Case Study

Critiquing a Pre-training Implementation

A data scientist is preparing a text sequence of 200 tokens for a self-supervised pre-training task. Their script correctly selects 30 tokens (15%) for the model to predict. However, the script then modifies the sequence by replacing all 30 of these selected tokens with a special [MASK] symbol. Based on the standard token modification strategy, what is the primary issue with this implementation?

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science