Analyzing Contextualization in Transformer Encoders
A Transformer model processes the masked bilingual input: [CLS] [MASK] 是 [MASK] 动物 。 [SEP] Whales [MASK] [MASK] . [SEP]. The initial representation for the first [MASK] token is its token embedding, let's call it e1. After the entire sequence passes through the encoder, the final representation for that same position is a contextualized hidden state, h1. Explain why the hidden state h1 is a much better representation for predicting the original word ('鲸鱼') than the initial embedding e1 was. In your explanation, identify the source of the additional information that enriches h1.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A model processes the following bilingual sequence where some words have been replaced by a special symbol:
[CLS] The cat sat on the [MASK] . [SEP] Le chat s'est assis sur le [MASK] . [SEP]. The model's encoder computes a final numerical representation (a hidden state) for every symbol in the sequence. Considering the final hidden state calculated for the first[MASK]symbol (in the English part), which statement best analyzes the information it contains?A language model is tasked with predicting the missing words in a masked bilingual sentence pair. Arrange the following steps in the correct chronological order to describe how the model's encoder processes the input to generate the final representations used for this prediction.
Analyzing Contextualization in Transformer Encoders