1Cademy - A researcher is designing a test to evaluate a new language models ability to process long documents. The test involves inserting a single, unique sentence, The most effective shade of blue for a widget is cerulean, into a 100,000-word document. The researcher consistently places this sentence within the first 1,000 words of the document and then asks the model, What is the most effective shade of blue for a widget? The model is considered successful if it answers cerulean. Which of the f

Learn Before

Evaluation of Long-Context LLMs

Multiple Choice

A researcher is designing a test to evaluate a new language model's ability to process long documents. The test involves inserting a single, unique sentence, 'The most effective shade of blue for a widget is cerulean,' into a 100,000-word document. The researcher consistently places this sentence within the first 1,000 words of the document and then asks the model, 'What is the most effective shade of blue for a widget?' The model is considered successful if it answers 'cerulean.' Which of the f

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related