Critique of Reranking Effectiveness
A research team proposes that to improve the output of a large language model, they will simply generate 100 candidate responses and use a reward model to select the best one. Critique this proposal. In your response, identify the primary assumption this plan relies on for success and explain the specific circumstances under which this approach is most likely to fail, even with a perfect reward model.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Strategies to Enhance Output Diversity for Reranking
Balancing Candidate Quality and Diversity in Reranking
An engineering team implements a system to improve a language model's output. For each user query, the system generates 10 candidate responses and then uses a highly accurate reward model to select the best one. Despite the high accuracy of the reward model, the team observes that the final selected response is rarely a significant improvement over any of the other 9 candidates. Which of the following is the most likely underlying cause for this lack of significant improvement?
Diagnosing Reranking System Performance
Evaluating Candidate Sets for Selection
Critique of Reranking Effectiveness