1Cademy - Self-Training

Concept

Self-Training

Self-training is a machine learning technique where a model is iteratively improved using its own predictions. The process begins with an initial model trained on a small set of labeled seed data. This model then generates 'pseudo labels' for a larger pool of unlabeled data. These pseudo-labeled data are then used to retrain and enhance the model in a bootstrapping process.

Updated 2026-04-14

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

Historical Applications of Self-Training
Comparison of Self-Supervised Pre-training and Self-Training
A machine learning team is implementing a self-training procedure to improve a text classification model. They begin by training an initial model on a small, high-quality labeled dataset. They then use this model to predict labels for a vast collection of unlabeled text, creating 'pseudo labels'. Finally, they retrain the model on a combination of the original labeled data and the newly pseudo-labeled data. Which of the following describes the most critical risk inherent to this self-training ap
A machine learning team has a small set of high-quality labeled data and a very large set of unlabeled data. They decide to use an iterative approach to improve their model's performance. Arrange the core steps of this process in the correct chronological order.
Evaluating a Model Training Strategy