1Cademy - In a self-attention mechanism where the prediction for a token at position `i` can only depend on tokens from positions 0 up to and including `i`, what is the total number of query-key dot products computed for an entire input sequence of 5 tokens (indexed 0 to 4)?

Learn Before

Enumeration of Dot Products in Causal Self-Attention

Multiple Choice

In a self-attention mechanism where the prediction for a token at position i can only depend on tokens from positions 0 up to and including i, what is the total number of query-key dot products computed for an entire input sequence of 5 tokens (indexed 0 to 4)?

Updated 2025-10-04

Contributors are: