Definition

LLM as the Agent in RLHF

In the context of Reinforcement Learning from Human Feedback (RLHF), the agent, often referred to as an LM agent, is the specific Large Language Model (LLM) undergoing training. It operates by interacting with its environment: it receives a text input from the environment and outputs a generated text response back to the environment. The agent's decision-making process is dictated by its policy, which is the mathematical function defined by the LLM representing the conditional probability of generating a specific output sequence given an input sequence, denoted as Pr(yx)\Pr(\mathbf{y} | \mathbf{x}).

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Related