Learn Before
Using an Oracle Model to Distinguish Model vs. Search Errors
A method for diagnosing a model's performance issues involves using a more powerful model as an 'oracle' to evaluate the outputs of an older, weaker model. The oracle selects the best response from an N-best list generated by the old model. The performance difference between this 'oracle output' and the old model's top-ranked output indicates the type of error. A significant difference suggests a 'model error,' where the model is fundamentally incapable of generating the correct answer. A small difference points to a 'search error,' meaning the model produced a good answer but failed to rank it as the top choice.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Learn After
Language Model Performance Diagnosis
A development team is analyzing the performance of their language model. For a set of test prompts, they take the top 5 responses generated by their model and have a significantly more powerful, 'oracle' model select the best response from that list. They find that the average quality score of their model's original top-ranked response is 70%, while the average quality score of the response selected by the oracle model is 95%. What does this large performance gap most strongly suggest about the primary limitation of the team's model?
Interpreting Model Diagnostic Results
Search Errors in LLMs