Case Study

Evaluating LLM Response Completeness

A developer is testing a large language model's ability to solve multi-step problems. They provide a prompt that concludes with a phrase intended to elicit a detailed reasoning process. Below are two responses generated by the model in different test runs for the same prompt. Evaluate the two responses. Which response is more effective for a user seeking a definitive solution, and what specific pitfall does Response A illustrate?

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science