Concept

Performance Degradation and Early Stopping in Pre-training

During the pre-training of language models, performance can begin to decline after a certain point. This degradation is sometimes attributed to interference, where learning new information negatively impacts previously learned knowledge. To counteract this, a practical strategy is to implement early stopping, which involves halting the training process to prevent such interference and preserve the model's optimal performance.

Image 0

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences