Definition

Parameterization of the LLM Policy

In the context of reinforcement learning, the Large Language Model (LLM) acts as the policy. This policy is a function defined by a set of parameters, commonly denoted by θ. These parameters, which consist of the neural network's weights and biases, are adjusted during the training phase to optimize the model's behavior.

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences