Multiple Choice

A research team is comparing two language models on a task that involves reading a 50-page story and then answering a question about a detail mentioned in the first chapter. Model A is specifically designed to handle very long texts, while Model B is a powerful general-purpose model. The team observes that Model B achieves a slightly lower (better) perplexity score across the entire 50-page text than Model A. However, Model A consistently answers the final question correctly, while Model B fails. What is the most likely reason for this discrepancy?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science