1Cademy - Performance Degradation and Early Stopping in Pre-training

Learn Before

Self-Supervised Pre-training of Encoders via Masked Language Modeling

Concept

Performance Degradation and Early Stopping in Pre-training

During the pre-training of language models, performance can begin to decline after a certain point. This degradation is sometimes attributed to interference, where learning new information negatively impacts previously learned knowledge. To counteract this, a practical strategy is to implement early stopping, which involves halting the training process to prevent such interference and preserve the model's optimal performance.