1Cademy - Steering Language Model Output for Slogan Generation

Learn Before

Formula for Re-weighting a Probability Distribution with a Reward Function

Case Study

Steering Language Model Output for Slogan Generation

A marketing team is using a language model to generate slogans. The model's initial probability for a slogan y given a product description x is given by π(y|x). The team finds that many of the highest-probability slogans are generic. To encourage more creative outputs, they decide to modify the likelihood of each slogan using the formula: New Score(y) = π(y|x) * exp(r(x, y)), where r(x, y) is a reward value assigned to each slogan.

Explain how you would design the reward function r(x, y) to achieve the team's goal. Specifically, describe what a positive reward, a negative reward, and a zero reward would signify in this context and how each would affect a slogan's final score.

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Learn Before

Related