Essay

Diagnosing and Fixing a Synthetic Instruction-Tuning Data Flywheel That Degrades Model Behavior

You lead an LLM enablement team building an internal “policy & procedures assistant” for a regulated enterprise. Because expert-labeled data is scarce, you create an instruction fine-tuning dataset using an automatic pipeline: (1) start from 300 expert-written seed instructions with gold answers, (2) use a weaker in-house model to generate new instructions and draft answers in a Self-Instruct-style loop, and (3) fine-tune a stronger model on the resulting instruction–response pairs. After two iterations, offline eval shows the strong model is more fluent and compliant in tone, but it now (a) confidently invents policy details, (b) overuses templated phrasing, and (c) performs worse on a small set of “hard” edge-case questions that the weak model also struggled with.

Write a recommendation memo that (i) diagnoses the most likely causal chain linking instruction fine-tuning, Self-Instruct/automatic data generation, data selection/filtering, and weak-to-strong generalization to these specific failure modes, and (ii) proposes a revised data strategy for the next iteration. Your proposal must include: what you would change about how instructions are generated, how you would filter/select data (with at least two concrete selection criteria or signals), and how you would use (or limit) weak-model-generated labels so the strong model improves without inheriting the weak model’s errors. Justify the trade-offs you are making (coverage vs. quality, diversity vs. consistency, and cost vs. risk).

Image 0

0

1

Updated 2026-02-06

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Related