Learn Before
Diagnosing Unintended Model Behavior After Adaptation
A development team adapts a powerful, pre-existing language model to create a specialized 'Fact-Checking Assistant'. They prepare a large dataset where each sample consists of a statement (the input) and a single, corresponding label: 'True', 'False', or 'Unverifiable' (the desired output). After extending the model's training on this new dataset, they find it performs with high accuracy on fact-checking tasks. However, they also discover an unintended side effect: when users ask the model general questions that are not simple statements (e.g., 'Can you explain the water cycle?'), the model frequently responds with 'True' or 'False' instead of providing a relevant explanation. Based on the adaptation process described, what is the most likely cause of this undesirable behavior?
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.4 Alignment - Foundations of Large Language Models
Ch.2 Generative Models - Foundations of Large Language Models
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Computational Expense of SFT for Large Language Models
Objective of Supervised Fine-Tuning
Computational Efficiency of Fine-Tuning Compared to Pre-training
Suitability of Fine-Tuning for Aligning with Human Values
Definition of LLM Alignment
Supervised Fine-Tuning for LLM Alignment
A company has a powerful, general-purpose language model that can write essays, answer questions, and summarize articles. They want to adapt this model to perform a new, specialized task: generating concise and helpful summaries of customer support tickets. Which of the following strategies represents the most direct and effective approach to adapt the model's internal parameters for this specific purpose?
Designing a Dataset for Model Behavior Adaptation
Embedding Task Knowledge into LLM Parameters via Fine-Tuning
Supervised Fine-Tuning (SFT) as an Example of Labeled Data Fine-Tuning
Diagnosing Unintended Model Behavior After Adaptation