Learn Before
The Role of a Baseline in Model Evaluation
In the process of using a less capable model to help train a more powerful one, why is it essential to first establish the performance of the less capable model on a test set? Explain the role this initial measurement plays in evaluating the final outcome.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Performance Gap Recovered (PGR)
Establishing a Performance Baseline
A research team is developing a powerful language model (a 'strong model') for a complex task. To guide its training, they first use a smaller, less capable model (a 'weak model'). They evaluate this weak model on a dedicated test set, where it achieves an accuracy of 72%. After the strong model is supervised by the weak model, the strong model achieves an accuracy of 85% on the same test set. In this scenario, what value represents the weak performance baseline (Pweak) used to measure the overall improvement?
The Role of a Baseline in Model Evaluation