Learn Before
Evaluating AI Response Quality
Review the following two AI-generated responses to the same prompt. Evaluate which response provides a stronger example of a system assessing its own output to improve accuracy, and justify your choice.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Methods for Activating Self-Reflection in LLMs
An AI model is asked, 'What is the approximate distance from the Earth to the Moon?' It provides two consecutive responses:
- Response 1: 'The distance from the Earth to the Moon is about 238,900 kilometers.'
- Response 2: 'Upon review, my previous answer was imprecise. The distance is in miles, not kilometers. The correct average distance is approximately 238,900 miles, which is about 384,400 kilometers. Stating the unit correctly is crucial for accuracy.'
Which of the following best analyzes the process demonstrated in Response 2?
Evaluating AI Response Quality
Mechanism of AI Self-Correction
You are reviewing a proposed architecture for an i...
You’re designing an internal LLM assistant for a f...
You’re leading an internal rollout of an LLM assis...
In an LLM-based customer support assistant, the mo...
Design Review: Combining Tool Use, DTG, and Predict-then-Verify for a High-Stakes API Workflow
Designing a Reliable LLM Workflow for Real-Time Decisions
Post-Incident Analysis: Preventing Confidently Wrong API-Backed Answers
Case Study: Shipping a Tool-Using LLM Assistant with Built-In Verification Under Latency Constraints
Case Review: Preventing Incorrect Refund Commitments in an LLM + Payments API Assistant
Case Study: Preventing Hallucinated Compliance Claims in an API-Enabled LLM for Vendor Risk Reviews