1Cademy - Troubleshooting an Automated Data Generation Process

Learn Before

Example of a Prompt Template for Sample Generation in Self-Instruct

Case Study

Troubleshooting an Automated Data Generation Process

A research team is using a large language model to automatically generate new training examples. Each example should consist of an 'instruction', a user 'input', and a corresponding 'output'. The team provides the model with several high-quality examples in this format, followed by a final instruction. However, they find that the model often just slightly rephrases one of the provided examples instead of creating a genuinely new one.

Analyze the team's final instruction to the model, provided in the case study below. Explain why this instruction is likely causing the problem and propose a specific, revised instruction to resolve the issue.

0

1

Updated 2025-10-05

Contributors are:

Who are from:

Learn Before

Related