1Cademy - Encoder Pre-training Output Architecture

Learn Before

Input and Output of a Sequence Encoder

Concept

Encoder Pre-training Output Architecture

To overcome the lack of direct supervision signals in encoder pre-training, a typical approach is to combine the encoder with output layers that generate easier-to-obtain supervision signals. For instance, adding a Softmax layer on top of a Transformer encoder creates an architecture identical to a decoder-based language model, allowing the system to output a sequence of probability distributions for training.

Updated 2026-04-15

Contributors are: