Learn Before
Task Pool in Self-Instruct
The Self-Instruct algorithm revolves around a dynamic task pool. This pool is initially populated with a set of manually created seed tasks and is continuously expanded as the algorithm runs by adding new instructions and samples generated by a Large Language Model.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Sample Generation in Self-Instruct
Filtering in Self-Instruct
Task Pool in Self-Instruct
Initialization of the Task Pool in Self-Instruct
Instruction Generation in Self-Instruct
Refining Prompt Templates in Self-Instruct
An AI development team wants to expand a small, manually-created set of instruction-following data into a much larger dataset for fine-tuning a language model. They decide to use the model itself to generate new data in an iterative loop. Which of the following procedures correctly describes the core cycle for generating one new, high-quality data point?
A team is using an iterative method to generate a large dataset for fine-tuning a language model, starting from a small set of examples. Arrange the core steps of a single cycle of this process in the correct order.
Diagnosing a Data Generation Pipeline Issue
Learn After
Structure of a Task Sample in Self-Instruct
An engineer is implementing a process to generate training data. The process begins with 100 manually-created instructional prompts. In each cycle, the system uses a language model to generate 20 new prompts, which are then reviewed for quality and added to the existing set. Which statement best analyzes the state of the prompt collection after 10 successful cycles?
A team is developing a system to generate instructional data. They begin with a fixed set of 500 human-written tasks. A language model is then prompted using only these 500 tasks to generate thousands of new examples. The newly generated instructions are collected for the final dataset but are never added back to the original pool of 500 tasks. What is the most significant limitation of this approach?
A team is using an automated process to expand a collection of instructional tasks, starting from a small set of human-written examples. Arrange the following events to show the correct sequence for how a single new, high-quality task is generated and integrated into the collection.