Case Study

Evaluating Reasoning Path Quality

A language model generates two different reasoning paths (Path A and Path B) to solve the same problem. A separate reward model evaluates each step of both paths and assigns a log-probability of it being 'correct'. Based on the data below, which path would receive a higher total reward score if the score is calculated by summing the log-probabilities of each step? Justify your answer by showing the calculation for each path's total score.

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science