Diagnosing Dataset Generation Issues
Based on the methodology described in the case study, what is the most likely flaw in the team's selection strategy that is causing the observed decrease in diversity, and why is this flaw detrimental?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Diagnosing Dataset Generation Issues
A research team is using a self-instruction method to generate a large dataset of tasks. In their process, for each new generation step, they exclusively sample from the small, initial set of human-written examples to prompt the language model. What is the most probable outcome for the final dataset if they follow this strategy?
Rationale for Mixed Instruction Sampling