Learn Before
Concept

Return

Our final goal is to choose actions over time so that we could maximize the expected value of the return. The definition of return is as below: The return GtG_t is the total discounted reward from time-step t. Gt=Rt+1+γRt+2+=k=0γkRt+k+1G_t = R_{t+1} + γR_{t+2} + · · · = \sum_{k=0}^{∞}γ^kR_{t+k+1}, where γ is the discounted factor. When γ close to 0 leads to “myopic” evaluation; γ close to 1 leads to “far-sighted” evaluation.

0

1

Updated 2026-01-15

Tags

Data Science

Foundations of Large Language Models Course

Computing Sciences