Concept

Cross-Task Generalization of Process Reward Models

A different strategy for mitigating the lack of step-level supervision involves capitalizing on the generalization power of Process Reward Models (PRMs). The method consists of training a PRM on a source task where detailed annotation data is abundant or easy to acquire. This pre-trained model is then transferred to a target task that lacks such data, applying its learned verification capabilities with minimal or no further fine-tuning.

0

1

Updated 2026-05-06

Contributors are:

Who are from:

Tags

Ch.5 Inference - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences