Concept

Cross-Layer Parameter Sharing in Transformers

Cross-layer sharing is an optimization method in Transformers that falls under the broader family of shared weight and shared activation methods. By sharing elements like Key-Value (KV) activations or attention weights across different layers, this technique reduces both computational demands and memory footprints. For example, a query in a higher layer can directly access the KV cache of a lower-level layer, eliminating redundant activations.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences