Learn Before
Identifying Fine-Tuning Methodologies
A legal tech company is developing a system to summarize lengthy contracts. They are considering two different methods for fine-tuning their language model. Analyze both methods described in the case study and determine which one represents an outcome-based approach. Justify your answer by explaining how the chosen method's supervision and reward structure align with the principles of this approach.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Limitations of Outcome-Based Rewards for Entire Sequences
A team is fine-tuning a language model to act as a programming assistant that writes code. For each programming problem, the model generates a block of code. The fine-tuning process involves running the generated code against a set of predefined tests. If the code passes all the tests, the model receives a high reward. If it fails any test, it receives a low reward. The structure, style, or efficiency of the code itself is not directly evaluated for the reward signal. Which principle of model fine-tuning does this scenario best exemplify?
Identifying Fine-Tuning Methodologies
Analyzing Fine-Tuning Methodologies