In a machine learning system, a penalty is assigned to every generated output segment using the function r(x, y, ȳ) = -1, where x is the input, y is the generated output, and ȳ is the average of all possible outputs. If the system generates an output segment, what is the value of the penalty it receives according to this function?
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
In a machine learning system, a penalty is assigned to every generated output segment using the function
r(x, y, ȳ) = -1, wherexis the input,yis the generated output, andȳis the average of all possible outputs. If the system generates an output segment, what is the value of the penalty it receives according to this function?Evaluating a Reward Function for Story Generation