True/False

Consider an attention mechanism where the output for a head j is computed by the formula head_j = Att_qkv(q_i^[j], K_<=i^[g(j)], V_<=i^[g(j)]). In this setup, q_i^[j] is a query vector unique to head j, while the function g(j) maps head j to a potentially shared key-value group.

Statement: If two distinct query heads, j1 and j2, are mapped to the same key-value group (meaning g(j1) = g(j2)), their final output vectors, head_j1 and head_j2, will necessarily be identical.

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science