An AI development team uses a two-stage system for a text generation task. First, a base generator creates a list of 10 possible outputs. Second, a separate scoring component reranks these 10 outputs to select the best one. The team investigates a case where the system produced a poor final output and makes the following observations:
- The final output selected by the scoring component was nonsensical.
- A manual review of the initial 10 generated outputs reveals that one of them was a high-quality, correct response.
- The scoring component assigned a very low score to this high-quality response.
Based on these observations, what is the most likely source of the system's failure in this specific case?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Diagnosing AI Generation Errors
An AI development team uses a two-stage system for a text generation task. First, a base generator creates a list of 10 possible outputs. Second, a separate scoring component reranks these 10 outputs to select the best one. The team investigates a case where the system produced a poor final output and makes the following observations:
- The final output selected by the scoring component was nonsensical.
- A manual review of the initial 10 generated outputs reveals that one of them was a high-quality, correct response.
- The scoring component assigned a very low score to this high-quality response.
Based on these observations, what is the most likely source of the system's failure in this specific case?
An AI development team is analyzing failures in their two-stage text generation system, which first generates multiple candidate responses and then uses a separate component to select the best one. Match each failure scenario with the most likely underlying cause.