1Cademy - Analyzing Contextualization in Transformer Encoders

Learn Before

Transformer Encoding of a Masked Bilingual Sentence Pair

Short Answer

Analyzing Contextualization in Transformer Encoders

A Transformer model processes the masked bilingual input: [CLS] [MASK] 是 [MASK] 动物。 [SEP] Whales [MASK] [MASK] . [SEP]. The initial representation for the first [MASK] token is its token embedding, let's call it e1. After the entire sequence passes through the encoder, the final representation for that same position is a contextualized hidden state, h1. Explain why the hidden state h1 is a much better representation for predicting the original word ('鲸鱼') than the initial embedding e1 was. In your explanation, identify the source of the additional information that enriches h1.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Learn Before

Related