Learn Before
Designing Filtering Rules for a Specialized AI
Imagine you are preparing a dataset to fine-tune a large language model to summarize complex scientific research papers into plain language for a general audience. The initial dataset consists of thousands of full-text academic articles scraped from various online repositories. Propose a set of three distinct rule-based filters (heuristics) you would apply to this dataset to improve its quality for the specified task. For each filter, you must:
- Clearly state the rule.
- Justify why this rule is necessary, explaining the specific data quality issue it targets and its potential impact on the model's final performance.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Creation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A team is preparing a dataset to fine-tune a large language model for a customer service chatbot. The raw data, collected from public online forums, contains many instances of toxic language, messages shorter than five words, and conversations that are not in English. The primary goal is to improve the model's ability to provide helpful, safe, and coherent responses in English. Which of the following filtering rules would be the most effective first step to improve the quality of this specific dataset for the intended task?
Unintended Consequences of Data Filtering
Designing Filtering Rules for a Specialized AI