Multiple Choice

A researcher is evaluating a new language model that can process an input of 200,000 tokens. They use a benchmark from several years ago, which was designed to test if a model could link a question to a piece of information located 500 words away within a 1,000-word text. What is the primary shortcoming of using this older benchmark to assess the new model's long-context capabilities?

0

1

Updated 2025-10-05

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences

Foundations of Large Language Models Course

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science