Learn Before
Multiple Choice

An instruction-tuned model is evaluated on a specific task: summarizing legal documents. The goal is to achieve intra-task generalization, which is formally defined as the average performance on a set of new inputs (Z) exceeding a predefined threshold (ε).

The evaluation uses a set of 100 new legal documents the model has never seen before. The performance threshold (ε) is set to 0.85.

  • Model A correctly summarizes 92 of the 100 new documents.
  • Model B correctly summarizes 81 of the 100 new documents.

Based on the formal definition, which of the following conclusions is correct?

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science