Learn Before
Short Answer

Analyzing Unexpected Attention Output

Imagine an attention mechanism where, for a specific step, the calculated attention weights for three input items are [0.9, 0.05, 0.05]. The first item is the most relevant. Despite this strong focus, the final output vector is found to be almost entirely composed of the information from the second input item (the one with a weight of 0.05). What is the most likely reason for this discrepancy, considering how the final output is constructed?

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science