Multiple Choice

A language model's policy, which determines the probability of generating an output y given an input x, is structured to be proportional to the exponential of a reward score r(x, y). For a specific input, two potential outputs have the following reward scores:

  • Output A: Reward = 3.0
  • Output B: Reward = 1.0

Based on this formulation, how does the probability of generating Output A compare to the probability of generating Output B?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science