1Cademy - Layer-wise Transformation of Hidden States

Learn Before

Fine-tuning for Sequence Encoding Models

Formula

Layer-wise Transformation of Hidden States

In a multi-layer neural network architecture, such as a Transformer, the computation proceeds sequentially through its layers. The output from layer $l$ , represented by the matrix of hidden states $\mathbf{H}^{l}$ , becomes the input for the subsequent layer, $l+1$ . This transformation is generally expressed by the formula: $\mathbf{H}^{l+1} = \text{Layer}(\mathbf{H}^{l})$ This equation signifies that the hidden states of the next layer are a function of the current layer's hidden states, encapsulating the layer's specific operations (e.g., self-attention, feed-forward network).

0

1

Updated 2025-10-09

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course

Learn Before

Related

Learn After