Learn Before
A system assigns a 'worth' value to potential text completions, calculated as the exponential of a reward score. Initially, three completions (A, B, C) have reward scores of 2.0, 3.0, and 4.0, respectively. If the reward score for each completion is increased by a constant value of 1.0, how does this change affect the ratio of worth between any two completions (e.g., the ratio of worth(B) to worth(A))?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Plackett-Luce Selection Probability Formula
A system assigns a 'worth' value to potential text completions, calculated as the exponential of a reward score. Initially, three completions (A, B, C) have reward scores of 2.0, 3.0, and 4.0, respectively. If the reward score for each completion is increased by a constant value of 1.0, how does this change affect the ratio of worth between any two completions (e.g., the ratio of worth(B) to worth(A))?
Calculating Response Worth for an AI Assistant
In a system that assigns a 'worth' value to a response by taking the exponential of its reward score, doubling the reward score for a response will also double its assigned worth value.