Troubleshooting an Automated Data Generation Process
A research team is using a large language model to automatically generate new training examples. Each example should consist of an 'instruction', a user 'input', and a corresponding 'output'. The team provides the model with several high-quality examples in this format, followed by a final instruction. However, they find that the model often just slightly rephrases one of the provided examples instead of creating a genuinely new one.
Analyze the team's final instruction to the model, provided in the case study below. Explain why this instruction is likely causing the problem and propose a specific, revised instruction to resolve the issue.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A research team is using an automated process to generate new training data. The process involves showing a language model a few high-quality examples, where each example consists of an 'instruction', a user 'input', and a corresponding 'output'. The goal is for the model to then create a completely new, well-formed example that follows the same three-part structure. Which of the following prompts, given to the model after the examples, would be most effective and precise for this task?
Troubleshooting an Automated Data Generation Process
Evaluating a Prompt Template for Data Generation