Definition

Final Hidden States in a Transformer Language Model

In a Transformer-based language model with LL layers, the final hidden states are the sequence of output vectors from the last Transformer block, denoted as {h0L,,hm1L}\{\mathbf{h}_0^L, \dots, \mathbf{h}_{m-1}^L\}. Each vector hiL\mathbf{h}_i^L represents the contextualized embedding of the ii-th token after processing through the entire stack of LL layers. This sequence of vectors encapsulates the model's final understanding of the input sequence and is used as the basis for subsequent predictions, such as generating logits for the next token.

Image 0

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences