Multiple Choice

An agent is learning to generate a five-sentence summary of a document. It only receives a final quality score (e.g., +0.9) after the entire summary is complete. To improve training, this single final score is used to create a learning signal for each of the five sentences generated. Which of the following options best analyzes how this transformation from a single score to multiple signals works?

0

1

Updated 2025-10-05

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science