Short Answer

Analyzing Text Corruption Strategies

Consider the following three methods for altering an input text sequence during the pre-training of a language model:

  1. Randomly replacing 15% of the words in the sequence with a special placeholder symbol.
  2. Randomly changing the order of sentences within the sequence.
  3. Deleting 15% of the words at random positions throughout the sequence.

Analyze these methods and identify which one is uniquely applicable to texts composed of multiple sentences. Justify your choice by explaining why the structure of a multi-sentence text is essential for this specific method to be applied, and why the other two methods do not share this requirement.

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science