Learn Before
Multiple Choice

A machine learning engineer is fine-tuning a large language model using a reinforcement learning approach. They mistakenly define the loss function to be minimized as L(θ)=ExDEyπθ(x)[U(x,y)]\mathcal{L}(\theta) = \mathbb{E}_{\mathbf{x}\sim\mathcal{D}} \mathbb{E}_{\mathbf{y}\sim\pi_{\theta}(\cdot|\mathbf{x})} [U(\mathbf{x}, \mathbf{y})], where U(x,y)U(\mathbf{x}, \mathbf{y}) is a utility function that returns high values for desirable outputs and low values for undesirable ones. What is the most likely outcome of this training process?

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science