Multiple Choice

An AI team fine-tunes a language model exclusively on a dataset for a single task: translating English legal documents into French. The model is then evaluated on two test sets.

  • Test Set A: A new, unseen collection of English legal documents to be translated into French.
  • Test Set B: A collection of diverse tasks, such as writing Python code, composing poetry, and summarizing news articles.

The model performs very well on Test Set A but performs poorly on Test Set B. What does this evaluation reveal about the model's generalization abilities?

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science