Fill in the Blank

In a Multi-Query Attention (MQA) layer, all attention heads share the same Key and Value matrices. The formula for the output of a single, specific head j at step i is given as: head_j = Att_qkv(______, K_<=i, V_<=i). What component correctly fills the blank to represent the unique input for this specific head?

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science