Learn Before
Example of a Packed Bilingual Sentence Sequence
To prepare bilingual data for a model, an aligned sentence pair, such as the Chinese '鲸鱼 是 哺乳 动物 。' and its English translation 'Whales are mammals .', is concatenated into a single input sequence. Special tokens are used to structure this sequence: [CLS] marks the beginning, and [SEP] separates the two sentences, with another [SEP] at the end. The final packed sequence is: [CLS] 鲸鱼 是 哺乳 动物 。 [SEP] Whales are mammals . [SEP].

0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Related
Example of a Packed Bilingual Sentence Sequence
A machine learning model is being trained to understand the relationship between sentences in two different languages. Which of the following pairs of sentences represents the highest-quality, most precisely aligned example for this training process?
Diagnosing Training Data Issues for a Bilingual Model
A key step in training a model to understand multiple languages is to provide it with correctly matched, or 'aligned,' sentence pairs. Match each English sentence with its direct Chinese translation to form a set of aligned pairs.
Learn After
Example of Masking a Bilingual Sentence Pair
A researcher has an aligned sentence pair: the English sentence 'The sky is blue .' and its Spanish translation 'El cielo es azul .'. To prepare this data for a language model, these two sentences must be combined into a single input sequence using special markers. Which of the following options shows the correct format for this combined sequence?
Correcting a Formatted Input Sequence
You are given an aligned sentence pair: the German sentence 'Katzen sind Tiere .' and its English translation 'Cats are animals .'. Arrange the following components into the correct single input sequence format for a bilingual model.