1Cademy - Evaluating Training Objectives for a Chatbot

Learn Before

Comparison of Objectives: Supervised Fine-Tuning vs. RLHF

Essay

Evaluating Training Objectives for a Chatbot

A company is developing a customer service chatbot. They have two primary training datasets. Dataset A consists of customer queries, each paired with a single, ideal response written by an expert. The training goal is to maximize the likelihood that the model generates this exact ideal response. Dataset B consists of customer queries, each paired with two different model-generated responses, and a label indicating which response a human preferred. The training goal is to generate responses that are more likely to be preferred by humans.

Analyze these two training approaches. Which approach is better suited for ensuring factual accuracy, and which is better for capturing a helpful and polite tone? Justify your reasoning by explaining the fundamental difference in their optimization objectives.

0

1

Updated 2025-10-06

Contributors are:

Who are from:

Learn Before

Related