Rationale for Mixed Instruction Sampling
In a self-instruction process for generating new tasks, a common strategy is to sample from a pool containing both the original, human-created seed instructions and the instructions previously generated by the model. Explain the primary reason for including both types of instructions in the sampling pool, rather than relying on just one type.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Diagnosing Dataset Generation Issues
A research team is using a self-instruction method to generate a large dataset of tasks. In their process, for each new generation step, they exclusively sample from the small, initial set of human-written examples to prompt the language model. What is the most probable outcome for the final dataset if they follow this strategy?
Rationale for Mixed Instruction Sampling