Recommending a Fine-Tuning Strategy for an AI Algebra Tutor
Considering the primary goal and the available data described in the case study, which of the two general classifications of supervisory approaches for reasoning tasks should the team prioritize? Justify your choice by analyzing the trade-offs of each approach in this specific context.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Outcome-based Approaches for LLM Fine-Tuning
Process-based Approaches for LLM Fine-Tuning
A team is fine-tuning a large language model to solve complex, multi-step logic puzzles. They are testing two different supervisory approaches:
- Approach 1: The model generates the full sequence of reasoning steps and provides a final answer. A human evaluator then checks only if the final answer is correct. The model receives a positive signal if the answer is correct and a negative signal if it is incorrect, regardless of the reasoning steps.
- Approach 2: The model generates its reasoning one step at a time. After each step, a human evaluator checks if that individual step is logically sound and correctly follows from the previous ones. The model receives a supervisory signal for each intermediate step in its reasoning chain.
What is the fundamental difference in how supervision is applied in these two approaches?
Recommending a Fine-Tuning Strategy for an AI Algebra Tutor
A team is fine-tuning a large language model for multi-step reasoning tasks. They are considering two general approaches for providing supervision: one that focuses only on the final answer, and one that evaluates each step of the reasoning process. Classify each of the following scenarios or characteristics by matching it to the correct supervisory approach.