Learn Before
Case Study

Diagnosing Unintended Model Behavior After Adaptation

A development team adapts a powerful, pre-existing language model to create a specialized 'Fact-Checking Assistant'. They prepare a large dataset where each sample consists of a statement (the input) and a single, corresponding label: 'True', 'False', or 'Unverifiable' (the desired output). After extending the model's training on this new dataset, they find it performs with high accuracy on fact-checking tasks. However, they also discover an unintended side effect: when users ask the model general questions that are not simple statements (e.g., 'Can you explain the water cycle?'), the model frequently responds with 'True' or 'False' instead of providing a relevant explanation. Based on the adaptation process described, what is the most likely cause of this undesirable behavior?

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Ch.2 Generative Models - Foundations of Large Language Models

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science