Diagnosing a Model's Generalization Failure
An instruction-tuned language model is trained on a massive dataset for the task of translating English sentences to French. Every training example uses the format: 'Translate this English sentence to French: [English sentence]'. When deployed, the model performs poorly on the user prompt: 'Could you render the following passage in French for me?'. Based on the principles of model generalization, explain the two distinct dimensions of variation that a robust model must handle, and diagnose the most likely reason for this specific model's failure.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An AI team is building a general-purpose chatbot. They train two different models on a large dataset of text summarization tasks.
- Model A is trained using 100,000 different articles, but every training example uses the exact same instruction: "Summarize the following text."
- Model B is trained using only 10,000 different articles, but the training examples use 1,000 varied instructions for summarization (e.g., "Give me the gist," "What are the key points?," "Provide a brief overview.").
When a user gives the prompt, "Can you give me the TL;DR for this article?", which model is more likely to fail at the task, and what is the most probable reason for its failure?
Diagnosing Generalization Failure in a Legal AI
Diagnosing a Model's Generalization Failure