1Cademy - Analyzing Unexpected Attention Output

Learn Before

Value (Attention)

Short Answer

Analyzing Unexpected Attention Output

Imagine an attention mechanism where, for a specific step, the calculated attention weights for three input items are [0.9, 0.05, 0.05]. The first item is the most relevant. Despite this strong focus, the final output vector is found to be almost entirely composed of the information from the second input item (the one with a weight of 0.05). What is the most likely reason for this discrepancy, considering how the final output is constructed?

Updated 2025-10-02

Contributors are:

Who are from:

Learn Before

Related