1Cademy - A language model generates a response that is evaluated by breaking it into four distinct segments. A reward function assigns a score to each segment based on its quality. The scores for the segments are: Segment 1: +1.2, Segment 2: -0.5, Segment 3: +0.8, and Segment 4: -0.2. If the total reward for the entire response is calculated by summing the rewards of its individual segments, what is the total reward?

Learn Before

Aggregated Reward as the Sum of Segment-Based Rewards

Multiple Choice

A language model generates a response that is evaluated by breaking it into four distinct segments. A reward function assigns a score to each segment based on its quality. The scores for the segments are: Segment 1: +1.2, Segment 2: -0.5, Segment 3: +0.8, and Segment 4: -0.2. If the total reward for the entire response is calculated by summing the rewards of its individual segments, what is the total reward?

Updated 2025-10-07

Contributors are:

Who are from:

Learn Before

Related