Example

Example of Masking a Bilingual Sentence Pair

This example demonstrates the application of token masking to a bilingual sentence pair. Starting with the aligned Chinese and English sentences packed into a single sequence, [CLS]鲸鱼 是 哺乳 动物 。 [SEP] Whales are mammals . [SEP], a certain percentage of tokens are replaced with the [MASK] symbol. This results in a corrupted input for the model, such as [CLS][MASK] 是 [MASK] 动物 。 [SEP] Whales [MASK][MASK] . [SEP], where the model's task is to predict the original tokens '鲸鱼', '哺乳', 'are', and 'mammals'.

Image 0

0

1

Updated 2026-04-18

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences