Essay

Distinguishing Evaluation Paradigms for Language Models

Consider two distinct evaluation scenarios for language models. In the first scenario, a model is tested on its ability to correctly link a pronoun to its antecedent several sentences earlier within a single paragraph. In the second scenario, a model is given a 100-page document and asked a question whose answer is a specific detail mentioned only once in the entire text.

Analyze the fundamental differences in the evaluation challenges presented by these two scenarios. In your response, explain why evaluation techniques designed for the first scenario are insufficient for assessing a model's performance in the second.

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences

Foundations of Large Language Models Course

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science