1Cademy - Diagnosing Undesirable Model Behavior

Learn Before

Objective Function for Policy Learning in RLHF

Case Study

Diagnosing Undesirable Model Behavior

Based on the provided scenario, identify which component of the optimization process is the most probable source of the model's new, undesirable behaviors. Justify your reasoning by explaining how this component directly influences the final model parameters.

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences