Multiple Choice

Two language models generate different reasoning paths (Path A and Path B) to solve the same complex problem. A reward model assesses each step, providing the log-probability of that step being 'correct'. The total reward for a path is the sum of these log-probabilities.

  • Path A log-probabilities: [-0.4, -0.5, -0.3]
  • Path B log-probabilities: [-0.1, -2.5, -0.2]

Based on this reward calculation method, which statement accurately compares the two paths?

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science