1Cademy - Standard Transformer Encoding Procedure

Learn Before

Transformer Encoder Stack

Activity (Process)

Standard Transformer Encoding Procedure

The standard procedure for processing an input sequence with a Transformer encoder begins by representing each input token, $x_i$ , as its corresponding embedding, $\mathbf{e}_i$ . This sequence of embeddings, $\mathbf{e}_{0},...,\mathbf{e}_m$ , is then fed into the encoder. The encoder processes this input to produce a sequence of contextualized output vectors, or hidden states, $\mathbf{h}_{0},...,\mathbf{h}_m$ .

Updated 2026-04-17

Contributors are:

Who are from:

Learn After

A language model's encoder processes an input sequence consisting of 15 tokens. The model is configured with a hidden size of 768. What will be the dimensions of the final sequence of contextualized vectors produced by this encoder?
Self-Supervised Pre-training of Encoders via Masked Language Modeling
Applying a Pre-trained Encoder to Downstream Tasks
Arrange the following steps, which describe how a standard Transformer encoder processes a sequence of tokens, into the correct chronological order.
Interpreting a Transformer Encoder's Output

Learn Before

Related

Learn After