Concept

Pre-train and Fine-tune Paradigm for Encoder Models

The two-stage paradigm for Transformer encoder models consists of a pre-training phase and an application phase. During pre-training, the encoder is paired with a Softmax layer and trained using self-supervision to develop general language representations. During the subsequent application phase, the initial Softmax layer is discarded, and the pre-trained encoder is combined with a task-specific prediction network. To ensure optimal performance on specialized downstream tasks, this combined system undergoes a fine-tuning process using labeled data.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related