1Cademy - An agent is learning to generate a five-sentence summary of a document. It only receives a final quality score (e.g., +0.9) after the entire summary is complete. To improve training, this single final score is used to create a learning signal for each of the five sentences generated. Which of the following options best analyzes how this transformation from a single score to multiple signals works?

Learn Before

Transforming Sparse Rewards into Dense Supervision Signals

Multiple Choice

An agent is learning to generate a five-sentence summary of a document. It only receives a final quality score (e.g., +0.9) after the entire summary is complete. To improve training, this single final score is used to create a learning signal for each of the five sentences generated. Which of the following options best analyzes how this transformation from a single score to multiple signals works?

Updated 2025-10-05

Contributors are:

Who are from:

Learn Before

Related