1Cademy - In a self-attention mechanism designed for autoregressive tasks, a sequence of 5 tokens is processed. The mechanism computes raw attention scores for each token relative to all other tokens. Before a final normalization step, a mask is added to these scores to prevent any token from attending to future tokens. For the 3rd token in the sequence, which vector correctly represents its scores for all 5 tokens *after* this causal mask has been applied? (Let `s_i` denote the original raw score for the 3rd token attending to the `i`-th token).

Learn Before

Causal Attention Mask Matrix Definition

Multiple Choice

In a self-attention mechanism designed for autoregressive tasks, a sequence of 5 tokens is processed. The mechanism computes raw attention scores for each token relative to all other tokens. Before a final normalization step, a mask is added to these scores to prevent any token from attending to future tokens. For the 3rd token in the sequence, which vector correctly represents its scores for all 5 tokens after this causal mask has been applied? (Let s_i denote the original raw score for the 3rd token attending to the i-th token).

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related