1Cademy - An autoregressive model is processing the input sequence The quick brown fox. When calculating the output representation for the token brown (the third token), which set of tokens can it attend to if a causal attention mechanism is being used?

Learn Before

Causal Attention Mechanism

Multiple Choice

An autoregressive model is processing the input sequence 'The quick brown fox'. When calculating the output representation for the token 'brown' (the third token), which set of tokens can it attend to if a causal attention mechanism is being used?

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

Attention Weight with Relative Positional Encoding
A language model is designed to generate a sentence one word at a time, from beginning to end. To generate the word at a specific position i, it uses an attention mechanism to weigh the importance of the words that came before it. Which of the following statements correctly analyzes the structural constraint required for this mechanism to function properly for this specific task?
Formula for Attention Weight with Relative Positional Encoding
Analyzing Attention Mechanism Constraints
An autoregressive model is processing the input sequence 'The quick brown fox'. When calculating the output representation for the token 'brown' (the third token), which set of tokens can it attend to if a causal attention mechanism is being used?

Learn Before

Related