Learn Before
Short Answer

Reward Model Suitability for a Creative Task

A developer is training a language model to generate short, engaging, and original marketing slogans. They decide to use a reward system that gives a high score to slogans that human raters find creative and a low score to those they find uninspired. This system does not analyze the intermediate steps the model took to generate the slogan. Explain why this focus on the final output is a particularly effective strategy for this specific training goal.

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science