1Cademy - Selecting and Filtering Self-Generated Instruction Data When Bootstrapping a Strong Model from a Weak Supervisor

Learn Before

Case Study

Selecting and Filtering Self-Generated Instruction Data When Bootstrapping a Strong Model from a Weak Supervisor

You lead an internal team fine-tuning a pre-trained LLM into a customer-support assistant for your company’s enterprise software. You have only 1,000 human-written, high-quality instruction–response examples (covering tone, policy, and product accuracy). To scale, you consider two synthetic data sources:

A) Self-Instruct expansion: use a strong off-the-shelf LLM to generate new instructions plus responses from your 1,000 seeds, producing 200,000 instruction–response pairs.

B) Weak-to-strong bootstrapping: use your current small in-house model (known to be polite but sometimes wrong on product details) to generate responses for 200,000 automatically generated instructions, then fine-tune your strong target model to match those responses.

After a pilot run, you observe: (1) the fine-tuned model is more compliant with formatting and tone, (2) it is noticeably more confident in a few recurring incorrect product claims that match the small in-house model’s mistakes, and (3) adding more synthetic data without filtering makes these incorrect claims more frequent.

As the person accountable for the next iteration, propose a concrete data strategy (what to generate, what to keep/remove, and what to prioritize) that uses instruction fine-tuning effectively while managing the trade-off between scaling via automatic/self-generated data and the risk of inheriting weak-model errors. Your answer must explicitly explain how your selection/filtering choices change the influence of Self-Instruct data vs weak-model-labeled data on the final model’s behavior.

Updated 2026-02-06

Contributors are:

Who are from:

Learn Before

Related