1Cademy - Application of Synthetic Data in the Pre-training Stage

Learn Before

Automatic Data Generation for Instruction Fine-Tuning

Concept

Application of Synthetic Data in the Pre-training Stage

The use of synthetic data is not limited to the fine-tuning phase of LLM development. There is also significant research interest in applying synthetically generated data during the pre-training stage, expanding its utility to the foundational phase of model creation.

Updated 2026-05-01

Contributors are: