1Cademy - A machine learning team wants to improve a base language models ability to follow instructions. They have already trained a separate, reliable reward model that can score the quality of any given response. The team wants to use this reward model to enhance the base models performance directly through a data-centric approach, avoiding more complex training paradigms. Which of the following strategies correctly describes the most effective and direct way to use the reward model for this purpose?

Learn Before

Rejection Sampling for LLM Fine-Tuning

Multiple Choice

A machine learning team wants to improve a base language model's ability to follow instructions. They have already trained a separate, reliable 'reward model' that can score the quality of any given response. The team wants to use this reward model to enhance the base model's performance directly through a data-centric approach, avoiding more complex training paradigms. Which of the following strategies correctly describes the most effective and direct way to use the reward model for this purpose?

Updated 2025-10-10

Contributors are:

Who are from:

Learn Before

Related