Learn Before
Unintended Consequences of Data Filtering
A development team is fine-tuning a language model to act as a programming assistant. After applying a set of predefined filtering rules to their dataset, they notice the fine-tuned model struggles to generate simple, concise code solutions (e.g., 'one-liners') and fails to explain basic programming concepts effectively. Based on the filtering rules listed in the case study, identify which rule is the most likely cause of this performance degradation and explain your reasoning.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A team is preparing a dataset to fine-tune a large language model for a customer service chatbot. The raw data, collected from public online forums, contains many instances of toxic language, messages shorter than five words, and conversations that are not in English. The primary goal is to improve the model's ability to provide helpful, safe, and coherent responses in English. Which of the following filtering rules would be the most effective first step to improve the quality of this specific dataset for the intended task?
Unintended Consequences of Data Filtering
Designing Filtering Rules for a Specialized AI