Short Answer

Analyzing Model Performance Components

An engineer is comparing two instruction-tuned language models. Model A consistently produces factually correct but stylistically poor outputs for a given instruction (c) and input (z). Model B produces stylistically excellent but often factually incorrect outputs for the same (c, z) pair. Explain how the performance metric, represented as P(c,z,y)P(c, z, y), helps in evaluating these two models beyond a simple 'correct' or 'incorrect' label for the output (y).

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science