Learn Before
Analyzing Fine-Tuning Methodologies
A research team is fine-tuning a language model to summarize news articles. The model is trained to first extract key sentences from the article and then generate a summary based on them. The fine-tuning process provides a reward signal based on two criteria: (1) the factual accuracy of the final summary compared to a human-written one, and (2) whether the intermediate sentences it extracted match a predefined list of 'golden' key sentences. Based on the principles of fine-tuning, explain why this approach is not a purely outcome-based method.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Limitations of Outcome-Based Rewards for Entire Sequences
A team is fine-tuning a language model to act as a programming assistant that writes code. For each programming problem, the model generates a block of code. The fine-tuning process involves running the generated code against a set of predefined tests. If the code passes all the tests, the model receives a high reward. If it fails any test, it receives a low reward. The structure, style, or efficiency of the code itself is not directly evaluated for the reward signal. Which principle of model fine-tuning does this scenario best exemplify?
Identifying Fine-Tuning Methodologies
Analyzing Fine-Tuning Methodologies