Multiple Choice

A research team develops a new method to evaluate a language model's ability to process documents that are thousands of pages long. Their process involves dividing each long document into individual paragraphs, asking a specific question about the content of each paragraph in isolation, and then calculating the average accuracy across all questions. The team argues that a high average score demonstrates the model's superior long-context capabilities. Which of the following best evaluates the team's conclusion?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science