Multiple Choice

An autoregressive language model is generating a sequence of tokens. The attention mechanism has already processed the first three tokens, resulting in the following Value matrix V stored in its cache, where each row corresponds to a token:

V = [[0.1, 0.8], [0.5, 0.2], [0.9, 0.3]]

For the fourth token, the model computes a new value vector:

v_new = [0.4, 0.6]

According to the standard update rule for the cache, what will the new Value matrix V be after this fourth token is processed?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science