Learn Before
Concept

Performance Stabilization via Global Tokens

A key benefit of incorporating global tokens is the stabilization of model performance, particularly when processing very long sequences. By providing a consistent global context, these tokens help to smooth the output distribution of the Softmax function used in the attention mechanism.

0

1

Updated 2026-04-23

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences