Multiple Choice

A machine learning team wants to improve a base language model's ability to follow instructions. They have already trained a separate, reliable 'reward model' that can score the quality of any given response. The team wants to use this reward model to enhance the base model's performance directly through a data-centric approach, avoiding more complex training paradigms. Which of the following strategies correctly describes the most effective and direct way to use the reward model for this purpose?

0

1

Updated 2025-10-10

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science