In a self-attention mechanism designed for a machine translation encoder, which processes an entire source sentence at once, the relative position offset between a query at position i and a key at position j (calculated as i - j) must always be greater than or equal to zero.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An engineer is inspecting a self-attention layer and observes that for a given query token, the set of calculated relative position offsets (
query_position - key_position) includes both positive and negative values. What can be concluded about the nature of this attention mechanism?In a self-attention mechanism designed for a machine translation encoder, which processes an entire source sentence at once, the relative position offset between a query at position
iand a key at positionj(calculated asi - j) must always be greater than or equal to zero.Choosing an Attention Mechanism for a Language Task