Multiple Choice

A language model generates a three-step reasoning path to solve a problem. A reward model evaluates each step and provides the following log-probabilities of each step being 'correct':

  • Step 1: -0.2
  • Step 2: -0.5
  • Step 3: -0.1

According to the method that calculates the total reward by summing the log-probabilities of each step, what is the final reward score for this entire path?

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science