Multiple Choice

A language model is calculating an output vector using a sparse attention mechanism. The computation for the current token only considers a subset of previous tokens, identified by the index set G = {0, 2, 3}. Given the value vectors and corresponding attention weights below, what is the correct output vector?

Value Vectors:

  • v_0 = [2, 1]
  • v_1 = [4, 5]
  • v_2 = [6, 0]
  • v_3 = [1, 3]

Attention Weights for the included set G:

  • α'_0 = 0.5
  • α'_2 = 0.2
  • α'_3 = 0.3

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science