1Cademy - Pre-train and Fine-tune Paradigm for Encoder Models

Learn Before

Applying and Adapting Pre-trained Models to Downstream Tasks
Sequence Encoding Models

Concept

Pre-train and Fine-tune Paradigm for Encoder Models

The two-stage paradigm for Transformer encoder models consists of a pre-training phase and an application phase. During pre-training, the encoder is paired with a Softmax layer and trained using self-supervision to develop general language representations. During the subsequent application phase, the initial Softmax layer is discarded, and the pre-trained encoder is combined with a task-specific prediction network. To ensure optimal performance on specialized downstream tasks, this combined system undergoes a fine-tuning process using labeled data.