A research team is developing a powerful new language model for summarizing scientific papers. Lacking a large, human-curated dataset of summaries, they use an older, less accurate model to generate summaries for 100,000 papers. They then fine-tune their powerful new model on this machine-generated dataset, with the goal of teaching it to produce summaries that match the ones from the older model. What is the most significant inherent risk in this training strategy?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Objective Function for Fine-Tuning a Strong LLM with Weak Supervision
A research team is developing a powerful new language model for summarizing scientific papers. Lacking a large, human-curated dataset of summaries, they use an older, less accurate model to generate summaries for 100,000 papers. They then fine-tune their powerful new model on this machine-generated dataset, with the goal of teaching it to produce summaries that match the ones from the older model. What is the most significant inherent risk in this training strategy?
Training Strategy for a Legal AI
Visual Diagram of Weak-to-Strong Generalization via Data Selection
A team is implementing a strategy where a powerful language model learns from a less capable one. Arrange the following steps into the correct chronological order to describe this process.
Your company is rolling out an instruction-tuned L...
You lead an LLM enablement team building an instru...
You’re leading an LLM platform team building an in...
Your company is building an internal IT helpdesk a...
Deciding Whether (and How) to Use Weak-Model Synthetic Data for Instruction Fine-Tuning
Diagnosing and Fixing a Synthetic Instruction-Tuning Data Flywheel That Degrades Model Behavior
Designing a Synthetic Instruction Fine-Tuning Pipeline Under Budget and Quality Constraints
Stabilizing an Instruction-Tuned Support Assistant When Synthetic Data Conflicts with Human Policy
Selecting and Filtering Self-Generated Instruction Data When Bootstrapping a Strong Model from a Weak Supervisor
Choosing a Weak-Model + Self-Instruct Data Strategy for Instruction Fine-Tuning Without Regressions