Activity (Process)

Parallel Self-Attention in the Prefilling Phase

A key characteristic of the prefilling phase is its ability to process the entire input sequence simultaneously. This allows for a highly parallelized self-attention computation where all query vectors are grouped into a single matrix, Q\mathbf{Q}. This approach makes efficient use of the parallel computing capabilities of modern GPUs, which significantly speeds up the prefilling process.

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related