1Cademy - Self-Instruct Process

Learn Before

Self-Instruct for Generating Fine-Tuning Data

Activity (Process)

Self-Instruct Process

The Self-Instruct method is an iterative process designed to generate a sufficient number of fine-tuning samples by expanding a task pool. It starts with a set of seed instructions and samples. In each cycle, the process samples instructions from the pool to prompt a Large Language Model (LLM) to create a new instruction. This new instruction, combined with existing samples, is then used to prompt the LLM to generate a complete input-output pair. These newly created samples are then filtered for quality and novelty before being added to the task pool, which is progressively enriched over many repetitions.