Learn Before
An autoregressive model is processing the input sequence 'The quick brown fox'. When calculating the output representation for the token 'brown' (the third token), which set of tokens can it attend to if a causal attention mechanism is being used?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Attention Weight with Relative Positional Encoding
A language model is designed to generate a sentence one word at a time, from beginning to end. To generate the word at a specific position
i, it uses an attention mechanism to weigh the importance of the words that came before it. Which of the following statements correctly analyzes the structural constraint required for this mechanism to function properly for this specific task?Formula for Attention Weight with Relative Positional Encoding
Analyzing Attention Mechanism Constraints
An autoregressive model is processing the input sequence 'The quick brown fox'. When calculating the output representation for the token 'brown' (the third token), which set of tokens can it attend to if a causal attention mechanism is being used?