1Cademy - Evaluating a Sparse Reward Strategy

Learn Before

End-of-Sequence Reward Assignment in RLHF

Short Answer

Evaluating a Sparse Reward Strategy

A team is training a language model to generate short stories. The training process provides a single quality score only after the entire story is written. For all intermediate steps (e.g., after each word is generated), the score is zero. Describe one potential advantage and one potential disadvantage of this training approach.

Updated 2025-10-06

Contributors are:

Who are from:

Learn Before

Related