Learn Before
Formula

General Attention Formula

The general attention mechanism maps a set of queries, keys, and values to an output. This output is calculated as a weighted sum of the value vectors, where the weights are determined by a compatibility function between the queries and keys. The matrix form of this operation is: Attqkv(Q,K,V)=α(Q,K)VAtt_{qkv}(\textbf{Q}, \textbf{K}, \textbf{V}) = \alpha(\textbf{Q}, \textbf{K})\textbf{V}. In this formula, Q\textbf{Q}, K\textbf{K}, and V\textbf{V} are the query, key, and value matrices, respectively. The term α(Q,K)\alpha(\textbf{Q}, \textbf{K}) represents the attention weight matrix, which has dimensions of m×mm \times m, where mm is the sequence length.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.5 Inference - Foundations of Large Language Models