Learn Before
A company is training a language model to act as an automated assistant for processing loan applications. The model must follow a specific, legally-mandated, multi-step procedure to ensure fairness and compliance (e.g., checking credit history, verifying income, providing specific disclosures). The company decides to train the model using a system that provides a large positive reward only if the final loan decision (approve/deny) is correct based on the applicant's overall profile. What is the most significant weakness of this training strategy?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Example of an Outcome-Based Reward Model in Mathematics
Insufficiency of Outcome-Based Rewards for Complex Reasoning
A company is training a language model to act as an automated assistant for processing loan applications. The model must follow a specific, legally-mandated, multi-step procedure to ensure fairness and compliance (e.g., checking credit history, verifying income, providing specific disclosures). The company decides to train the model using a system that provides a large positive reward only if the final loan decision (approve/deny) is correct based on the applicant's overall profile. What is the most significant weakness of this training strategy?
Evaluating Reward Model Suitability
Reward Model Suitability for a Creative Task