Choosing an Attention Mechanism for a Language Task
Based on the provided scenario, which group's argument is more appropriate for the task of whole-document sentiment analysis? Justify your answer by explaining how the range of allowed position offsets relates to the model's ability to understand the context of the entire review.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An engineer is inspecting a self-attention layer and observes that for a given query token, the set of calculated relative position offsets (
query_position - key_position) includes both positive and negative values. What can be concluded about the nature of this attention mechanism?In a self-attention mechanism designed for a machine translation encoder, which processes an entire source sentence at once, the relative position offset between a query at position
iand a key at positionj(calculated asi - j) must always be greater than or equal to zero.Choosing an Attention Mechanism for a Language Task