1Cademy - LLM as the Agent in RLHF

Learn Before

Agent in Reinforcement Learning
Reinforcement Learning from Human Feedback (RLHF)
Bridging Language Modeling and Reinforcement Learning Notations in RLHF
Architectural Components of an RLHF System

Definition

LLM as the Agent in RLHF

In the context of Reinforcement Learning from Human Feedback (RLHF), the agent, often referred to as an LM agent, is the specific Large Language Model (LLM) undergoing training. It operates by interacting with its environment: it receives a text input from the environment and outputs a generated text response back to the environment. The agent's decision-making process is dictated by its policy, which is the mathematical function defined by the LLM representing the conditional probability of generating a specific output sequence given an input sequence, denoted as $\Pr(\mathbf{y} | \mathbf{x})$ .