Learn Before
Short Answer

Deconstructing the Reinforcement Learning Loss Function

A common loss function used to update a language model's policy, πθ\pi_{\theta}, is given by the formula: L(θ)=Eyπθ(x)[U(x,y)]\mathcal{L}(\theta) = -\mathbb{E}_{\mathbf{y}\sim\pi_{\theta}(\cdot|\mathbf{x})} [U(\mathbf{x}, \mathbf{y})], where U(x,y)U(\mathbf{x}, \mathbf{y}) is a function that assigns a high score to desirable outputs. Analyze this formula and explain the specific purpose of two of its key components:

  1. The negative sign (-) at the beginning of the expression.
  2. The expectation (Eyπθ(x)\mathbb{E}_{\mathbf{y}\sim\pi_{\theta}(\cdot|\mathbf{x})}) taken over the model's output distribution.

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science