Learn Before
Multiple Choice

A language model's policy, πθ\pi_{\theta}, is being updated by minimizing the loss function L(θ)=E[U(x,y)]\mathcal{L}(\theta) = -\mathbb{E}[U(\mathbf{x}, \mathbf{y})], where x\mathbf{x} is a given input, y\mathbf{y} is an output generated by the model, and U(x,y)U(\mathbf{x}, \mathbf{y}) is a utility function that assigns a high score to desirable outputs and a low score to undesirable ones. What is the direct consequence of minimizing this loss function on the model's behavior?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science