Learn Before
Interpreting the Advantage Function
When training a language model using reinforcement learning, the 'advantage' of generating a particular token is calculated. If the calculated advantage for generating a specific token is a negative number, what does this imply about the combination of the immediate reward received and the change in expected future rewards? Explain the relationship between the components that leads to this negative outcome.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Value Function Loss Minimization in RLHF
A language model is being trained to generate text. At a certain step, it considers generating the next token. The system has the following estimates:
- The value (expected future rewards) of the current state is 1.2.
- After generating a specific token, the immediate reward received is +0.5.
- The value of the new state after generating the token is 1.0.
- The discount factor for future rewards is 0.9.
Based on the standard temporal difference method for estimating the advantage, what is the advantage of taking this action, and what does it imply?
Policy Improvement Decision
Interpreting the Advantage Function