Learn Before
Initialization of the Task Pool in Self-Instruct
The Self-Instruct process begins by establishing a task pool with an initial set of seed tasks. These foundational tasks are hand-crafted, with each one comprising a specific instruction along with a corresponding input-output sample.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Sample Generation in Self-Instruct
Filtering in Self-Instruct
Task Pool in Self-Instruct
Initialization of the Task Pool in Self-Instruct
Instruction Generation in Self-Instruct
Refining Prompt Templates in Self-Instruct
An AI development team wants to expand a small, manually-created set of instruction-following data into a much larger dataset for fine-tuning a language model. They decide to use the model itself to generate new data in an iterative loop. Which of the following procedures correctly describes the core cycle for generating one new, high-quality data point?
A team is using an iterative method to generate a large dataset for fine-tuning a language model, starting from a small set of examples. Arrange the core steps of a single cycle of this process in the correct order.
Diagnosing a Data Generation Pipeline Issue
Learn After
Sampling in Self-Instruct
A research team is initiating a process to enhance a language model's ability to generate Python code from natural language descriptions. For their initial task pool, they gather 1,000 random Python code snippets from public repositories. Based on the principles of initializing this process, what is the primary weakness of their approach?
Evaluating Seed Task Suitability
In a methodology designed to bootstrap a large set of instructional data, the initial 'seed' tasks used to start the process are typically generated automatically by a language model.