In a self-attention mechanism designed to process information sequentially without looking ahead, a sequence of 8 elements (indexed 0 to 7) is being processed. The query vector for the element at position 5 will be compared against a total of ____ key vectors.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A self-attention mechanism is configured to ensure that when processing a sequence, the output for any given position
iis influenced only by inputs from positionsjwherej <= i. This prevents the model from 'seeing' future elements. The interaction between the query from positioniand the key from positionjresults in a score. For a sequence of 4 elements (positions 0, 1, 2, 3), which of the following score matrices violates this principle? ('S' indicates a calculated score; '0' indicates a disallowed or masked interaction.)Debugging an Autoregressive Model's Attention
In a self-attention mechanism designed to process information sequentially without looking ahead, a sequence of 8 elements (indexed 0 to 7) is being processed. The query vector for the element at position 5 will be compared against a total of ____ key vectors.