1Cademy - Reward Model Suitability for a Creative Task

Learn Before

Outcome-Based Reward Models

Short Answer

Reward Model Suitability for a Creative Task

A developer is training a language model to generate short, engaging, and original marketing slogans. They decide to use a reward system that gives a high score to slogans that human raters find creative and a low score to those they find uninspired. This system does not analyze the intermediate steps the model took to generate the slogan. Explain why this focus on the final output is a particularly effective strategy for this specific training goal.

Updated 2025-10-07

Contributors are:

Who are from:

Learn Before

Related