1Cademy - Consider a standard pre-training procedure for a language model where 15% of all tokens in an input are first selected for prediction. Of these selected tokens, 80% are then replaced with a special `[MASK]` symbol. Based on this procedure, it is guaranteed that for any given input sequence of 1,000 tokens, exactly 120 tokens will be replaced with the `[MASK]` symbol.

Learn Before

Token Masking in BERT's MLM Strategy

True/False

Consider a standard pre-training procedure for a language model where 15% of all tokens in an input are first selected for prediction. Of these selected tokens, 80% are then replaced with a special [MASK] symbol. Based on this procedure, it is guaranteed that for any given input sequence of 1,000 tokens, exactly 120 tokens will be replaced with the [MASK] symbol.

Updated 2025-10-06

Contributors are:

Who are from:

Learn Before

Related