Learn Before
Evaluating a Data Strategy for a Global Chatbot
A tech company is developing a large language model to power a global customer service chatbot. Their goal is to create a model that is fair and effective for users worldwide. To do this, they collect a vast dataset of customer service transcripts from North America, Europe, and Asia. However, to standardize the training process, they use an automated translation service to convert all non-English transcripts into English before feeding them into the model. Critically evaluate this data collection strategy. What specific types of bias might this 'translate-to-English' approach introduce or fail to mitigate, and why?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An AI development team trains a large language model to assist with writing professional emails. After deployment, they receive feedback that the model's suggestions for users with non-Western names often sound overly casual or grammatically awkward, while suggestions for users with common Western names are consistently high-quality. The training data consisted primarily of a large, publicly available email corpus from a North American tech company. What is the most likely reason for this performance discrepancy, and which action would be the most effective first step to address it?
Evaluating a Data Strategy for a Global Chatbot
Critique of a Bias Mitigation Strategy