1Cademy - A team of engineers is developing a system to automatically provide feedback to a language model during its training phase. They decide to use a simple reward mechanism: a positive signal is given if the models final output is correct, and a negative signal is given if it is incorrect. For which of the following tasks would this reward mechanism be the most effective and least ambiguous?

Learn Before

Example of Outcome-Based Reward for a Mathematical Task

Multiple Choice

A team of engineers is developing a system to automatically provide feedback to a language model during its training phase. They decide to use a simple reward mechanism: a positive signal is given if the model's final output is correct, and a negative signal is given if it is incorrect. For which of the following tasks would this reward mechanism be the most effective and least ambiguous?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related