1Cademy - Parameterization of the LLM Policy

Learn Before

Reinforcement Learning Process for LLMs

Definition

Parameterization of the LLM Policy

In the context of reinforcement learning, the Large Language Model (LLM) acts as the policy. This policy is a function defined by a set of parameters, commonly denoted by θ. These parameters, which consist of the neural network's weights and biases, are adjusted during the training phase to optimize the model's behavior.

Updated 2025-10-08

Contributors are: