Sample Generation in Self-Instruct
Following the creation of a new instruction within the Self-Instruct framework, a Large Language Model is prompted to generate a complete sample. This process involves using the newly generated instruction to guide the model in creating the necessary input fields and generating the corresponding output, thereby completing the data instance.

0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Sample Generation in Self-Instruct
Filtering in Self-Instruct
Task Pool in Self-Instruct
Initialization of the Task Pool in Self-Instruct
Instruction Generation in Self-Instruct
Refining Prompt Templates in Self-Instruct
An AI development team wants to expand a small, manually-created set of instruction-following data into a much larger dataset for fine-tuning a language model. They decide to use the model itself to generate new data in an iterative loop. Which of the following procedures correctly describes the core cycle for generating one new, high-quality data point?
A team is using an iterative method to generate a large dataset for fine-tuning a language model, starting from a small set of examples. Arrange the core steps of a single cycle of this process in the correct order.
Diagnosing a Data Generation Pipeline Issue
Instruction Sampling for Diversity in Self-Instruct
Example of a Prompt Template for Instruction Generation in Self-Instruct
Sample Generation in Self-Instruct
An AI development team provides a large language model with a prompt containing several existing task instructions, such as 'Translate this sentence into French' and 'Write a poem about the ocean.' The prompt then asks the model to generate a new, distinct instruction based on the examples provided. What is the primary function of including the existing instructions in the prompt?
Automated Task Creation for a Marketing Dataset
Analyzing a Flawed Prompt for Instruction Generation
Learn After
Filtering in Self-Instruct
In an automated process for generating training data, a language model has just created a new, unique instruction: 'Write a product description for a fictional gadget.' To complete the data instance for this instruction, what is the essential next task for the model?
Example of a Prompt Template for Sample Generation in Self-Instruct
An automated system for creating training data has just generated a new instruction: 'Summarize the provided text into a single sentence.' In the subsequent step, the system produces the following text: 'The main character overcomes several obstacles to achieve their lifelong dream.' Based on the requirements for creating a complete data instance, what crucial component is missing from this generated sample?
Diagnosing a Flaw in an Automated Data Generation Process