Short Answer

Comparing Fine-Tuning Supervision Strategies

Imagine two teams are fine-tuning a language model to solve complex mathematical word problems. Team A's method only provides a positive reward if the model's final numerical answer is correct. Team B's method provides a positive reward for each valid logical step the model generates on its way to the final answer. Explain the fundamental difference in these two supervisory approaches and identify the two key technical components Team B would need to implement that Team A would not.

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science