Formula

Formula for Scoring Reasoning Paths by Counting Correct Steps

A simple method to score a reasoning path is to count the number of steps classified as 'correct'. This can be represented by the formula:

r(x,y)=k=1nsδ(correct,C(x,yˉk))r(\mathbf{x}, \mathbf{y}) = \sum_{k=1}^{n_s} \delta(\text{correct}, C(\mathbf{x}, \bar{\mathbf{y}}_{\le k}))

where:

  • r(x,y)r(\mathbf{x}, \mathbf{y}) is the total reward for the reasoning path y\mathbf{y} given the input x\mathbf{x}.
  • nsn_s is the total number of steps in the reasoning path.
  • C(x,yˉk)C(\mathbf{x}, \bar{\mathbf{y}}_{\le k}) is the classification output for step kk, determined by selecting the label with the maximum probability.
  • δ(a,b)\delta(a, b) is the Kronecker delta function, which is 1{}1 if a=ba=b and 0{}0 otherwise. In this context, it equals 1{}1 if the classification for step kk is 'correct'.
Image 0

0

1

Updated 2026-05-03

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models