1Cademy - When calculating the output for the token at position `i=5` in a sequence using a causal attention mechanism, the value vector from position `j=6` (`v_6`) is incorporated into the weighted sum.

Learn Before

Formula for Causal Attention

True/False

When calculating the output for the token at position i=5 in a sequence using a causal attention mechanism, the value vector from position j=6 (v_6) is incorporated into the weighted sum.

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

An auto-regressive model is processing a sequence of 4 tokens. To compute the output for the token at position i=2, it uses a causal attention mechanism. Given the value vectors and the calculated attention weights below, what is the resulting output vector for this position?

Value Vectors:
- v_0 = [1.0, 0.0]
- v_1 = [0.0, 2.0]
- v_2 = [3.0, 1.0]
- v_3 = [2.0, 2.0]
Attention Weights for position i=2:
- Weight for v_0: 0.1
- Weight for v_1: 0.3
- Weight for v_2: 0.6
When calculating the output for the token at position i=5 in a sequence using a causal attention mechanism, the value vector from position j=6 (v_6) is incorporated into the weighted sum.
Given the formula for the output of a causal attention mechanism for a specific query vector q_i: $\text{Att}(\mathbf{q}_i, \mathbf{K}_{\leq i}, \mathbf{V}_{\leq i}) = \sum_{j=0}^{i} \alpha(i, j) \mathbf{v}_j$ Match each component of the formula to its correct description.

Learn Before

Related