1Cademy - In a self-attention mechanism, the output for a specific token is calculated as a weighted sum of value vectors from all tokens in the sequence. If the attention weight connecting a query token to a specific value token is exactly zero, that value token has no contribution to the final output for the query token.

Learn Before

Attention Output as a Weighted Sum of Values

True/False

In a self-attention mechanism, the output for a specific token is calculated as a weighted sum of 'value' vectors from all tokens in the sequence. If the attention weight connecting a query token to a specific value token is exactly zero, that value token has no contribution to the final output for the query token.

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related