Learn Before
Example of Masking a Bilingual Sentence Pair
This example demonstrates the application of token masking to a bilingual sentence pair. Starting with the aligned Chinese and English sentences packed into a single sequence, [CLS]鲸鱼 是 哺乳 动物 。 [SEP] Whales are mammals . [SEP], a certain percentage of tokens are replaced with the [MASK] symbol. This results in a corrupted input for the model, such as [CLS][MASK] 是 [MASK] 动物 。 [SEP] Whales [MASK][MASK] . [SEP], where the model's task is to predict the original tokens '鲸鱼', '哺乳', 'are', and 'mammals'.

0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Example of Masking a Bilingual Sentence Pair
A researcher has an aligned sentence pair: the English sentence 'The sky is blue .' and its Spanish translation 'El cielo es azul .'. To prepare this data for a language model, these two sentences must be combined into a single input sequence using special markers. Which of the following options shows the correct format for this combined sequence?
Correcting a Formatted Input Sequence
You are given an aligned sentence pair: the German sentence 'Katzen sind Tiere .' and its English translation 'Cats are animals .'. Arrange the following components into the correct single input sequence format for a bilingual model.
Learn After
Transformer Encoding of a Masked Bilingual Sentence Pair
A model is being prepared to understand relationships between aligned sentences in different languages. An input sequence is created by joining a Spanish sentence and its English translation. To train the model to predict missing words, some original words are replaced with a special
[MASK]symbol. Given the original packed sequence below, which option correctly demonstrates this replacement process?Original Sequence:
[CLS] El gato se sentó en la alfombra . [SEP] The cat sat on the mat . [SEP]Optimizing a Model's Training Strategy
Evaluating a Masking Strategy for Specialized Translation