Multiple Choice

A system for ranking text responses first assigns a numerical reward score to each response, and then calculates a 'worth' value for each response using the formula: worth = exp(reward score). Consider two scenarios:

Scenario 1: Response A has a reward score of 3.0, and Response B has a reward score of 1.0. Scenario 2: Response C has a reward score of 8.0, and Response D has a reward score of 6.0.

How does the ratio of worths (Worth_A / Worth_B) in Scenario 1 compare to the ratio of worths (Worth_C / Worth_D) in Scenario 2?

0

1

Updated 2025-10-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

Related