Multiple Choice

An AI development team is in a phase of training where their goal is to make a language model's responses more aligned with human preferences. They use an optimization process that aims to minimize a loss function, L, which takes an input prompt x, a set of model-generated responses {y1, y2, ...}, and a component r as inputs. How does this loss function L primarily guide the model's policy towards generating better responses?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science