1Cademy - Representing Masked Spans with Sentinel Tokens

Learn Before

Consecutive Token Masking in MLM

Concept

Representing Masked Spans with Sentinel Tokens

An alternative approach to handling consecutive masked tokens is to treat them as a single span. Following the methodology of Raffel et al. (2020), a unique sentinel token, such as [X], [Y], or [Z], replaces one or more adjacent masked tokens in the corrupted input sequence, effectively creating placeholder slots. The model's training task is then to fill these slots with the correct original tokens using the surrounding context. A significant advantage of consolidating multiple masks into a single placeholder is that the sequences used during training become shorter, leading to more computationally efficient training.