Learn Before
Concept

Representing Masked Spans with Sentinel Tokens

An alternative approach to handling consecutive masked tokens is to treat them as a single span. Following the methodology of Raffel et al. (2020), a unique sentinel token, such as [X], [Y], or [Z], replaces one or more adjacent masked tokens in the corrupted input sequence, effectively creating placeholder slots. The model's training task is then to fill these slots with the correct original tokens using the surrounding context. A significant advantage of consolidating multiple masks into a single placeholder is that the sequences used during training become shorter, leading to more computationally efficient training.

0

1

Updated 2026-04-17

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences