Transformer Encoding of a Masked Bilingual Sentence Pair
This example illustrates the encoding process for a masked bilingual sentence pair. The input sequence, [CLS] [MASK]是 [MASK]动物。 [SEP] Whales [MASK] [MASK] . [SEP], is first converted into a series of token embeddings, denoted as e0 through e11. This embedding sequence is then processed by a Transformer Encoder, which outputs a corresponding sequence of contextualized hidden states, h0 through h11. These hidden states serve as the basis for predicting the original masked tokens: '鲸鱼', '哺乳', 'are', and 'mammals'.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Transformer Encoding of a Masked Bilingual Sentence Pair
A model is being prepared to understand relationships between aligned sentences in different languages. An input sequence is created by joining a Spanish sentence and its English translation. To train the model to predict missing words, some original words are replaced with a special
[MASK]symbol. Given the original packed sequence below, which option correctly demonstrates this replacement process?Original Sequence:
[CLS] El gato se sentó en la alfombra . [SEP] The cat sat on the mat . [SEP]Optimizing a Model's Training Strategy
Evaluating a Masking Strategy for Specialized Translation
Transformer Encoder part:
Standard Transformer Encoding Procedure
Role of Positional Embeddings in Order-Insensitive Models
Key Hyperparameters of a Transformer Encoder
Transformer Encoding of a Masked Bilingual Sentence Pair
Prefix Tuning
In a sequence-to-sequence model, the input is processed by a stack of six encoder layers that have identical structures. A proposal is made to modify this architecture so that all six encoder layers share the exact same set of weights, with the goal of reducing the total number of model parameters. Which statement best analyzes the primary consequence of this change on the model's ability to process information?
A sentence is fed into the encoder side of a Transformer model. Arrange the following steps in the correct sequence to describe how the initial input is processed by the stack of encoders.
Improving a Transformer's Contextual Understanding
Learn After
A model processes the following bilingual sequence where some words have been replaced by a special symbol:
[CLS] The cat sat on the [MASK] . [SEP] Le chat s'est assis sur le [MASK] . [SEP]. The model's encoder computes a final numerical representation (a hidden state) for every symbol in the sequence. Considering the final hidden state calculated for the first[MASK]symbol (in the English part), which statement best analyzes the information it contains?A language model is tasked with predicting the missing words in a masked bilingual sentence pair. Arrange the following steps in the correct chronological order to describe how the model's encoder processes the input to generate the final representations used for this prediction.
Analyzing Contextualization in Transformer Encoders