A system models preferences by first assigning a numerical reward score to a response and then converting it to a 'worth' value using the formula: worth = exp(reward_score). An engineer improves a response, causing its reward score to increase first from 2.0 to 3.0, and then with a further improvement, from 3.0 to 4.0. How does the increase in the response's 'worth' value during the first improvement compare to the increase during the second improvement?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Policy Proportional to Exponentiated Reward
A system for ranking text responses first assigns a numerical reward score to each response, and then calculates a 'worth' value for each response using the formula: worth = exp(reward score). Consider two scenarios:
Scenario 1: Response A has a reward score of 3.0, and Response B has a reward score of 1.0. Scenario 2: Response C has a reward score of 8.0, and Response D has a reward score of 6.0.
How does the ratio of worths (Worth_A / Worth_B) in Scenario 1 compare to the ratio of worths (Worth_C / Worth_D) in Scenario 2?
A system for modeling human preferences assigns a numerical reward score,
r, to a given text response. This score can be positive, negative, or zero. To use these scores in a specific type of ranking probability model, each scorermust be converted into a 'worth' valueαthat is always positive and strictly increases asrincreases. A researcher proposes using the functionα = r² + 0.1for this conversion. Which statement correctly analyzes the suitability of this proposed function?A system models preferences by first assigning a numerical reward score to a response and then converting it to a 'worth' value using the formula:
worth = exp(reward_score). An engineer improves a response, causing its reward score to increase first from 2.0 to 3.0, and then with a further improvement, from 3.0 to 4.0. How does the increase in the response's 'worth' value during the first improvement compare to the increase during the second improvement?