Learn Before
A team is preparing a dataset to fine-tune a large language model for a customer service chatbot. The raw data, collected from public online forums, contains many instances of toxic language, messages shorter than five words, and conversations that are not in English. The primary goal is to improve the model's ability to provide helpful, safe, and coherent responses in English. Which of the following filtering rules would be the most effective first step to improve the quality of this specific dataset for the intended task?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A team is preparing a dataset to fine-tune a large language model for a customer service chatbot. The raw data, collected from public online forums, contains many instances of toxic language, messages shorter than five words, and conversations that are not in English. The primary goal is to improve the model's ability to provide helpful, safe, and coherent responses in English. Which of the following filtering rules would be the most effective first step to improve the quality of this specific dataset for the intended task?
Unintended Consequences of Data Filtering
Designing Filtering Rules for a Specialized AI