Short Answer

Analyzing Contextualization in Transformer Encoders

A Transformer model processes the masked bilingual input: [CLS] [MASK] 是 [MASK] 动物 。 [SEP] Whales [MASK] [MASK] . [SEP]. The initial representation for the first [MASK] token is its token embedding, let's call it e1. After the entire sequence passes through the encoder, the final representation for that same position is a contextualized hidden state, h1. Explain why the hidden state h1 is a much better representation for predicting the original word ('鲸鱼') than the initial embedding e1 was. In your explanation, identify the source of the additional information that enriches h1.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science