1Cademy - Span Masking

Learn Before

Token Masking as an Input Corruption Method

Concept

Span Masking

Span masking is an input corruption technique in which non-overlapping spans of tokens are randomly sampled from a sequence, and each span is replaced by a single [MASK] token. This approach uniquely accommodates spans of length 0, where a [MASK] token is simply inserted at a chosen position in the sequence without removing any original tokens.