Learn Before
Concept

Encoder Pre-training Output Architecture

To overcome the lack of direct supervision signals in encoder pre-training, a typical approach is to combine the encoder with output layers that generate easier-to-obtain supervision signals. For instance, adding a Softmax layer on top of a Transformer encoder creates an architecture identical to a decoder-based language model, allowing the system to output a sequence of probability distributions for training.

0

1

Updated 2026-04-15

Contributors are:

Who are from:

Tags

Foundations of Large Language Models

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences