Short Answer

Evaluating a Sparse Reward Strategy

A team is training a language model to generate short stories. The training process provides a single quality score only after the entire story is written. For all intermediate steps (e.g., after each word is generated), the score is zero. Describe one potential advantage and one potential disadvantage of this training approach.

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science