Activity (Process)

Direct Supervision via Knowledge Distillation Loss in Weak-to-Strong Generalization

In weak-to-strong generalization, instead of using a weak model to generate a synthetic dataset for training, a strong model can be trained directly through supervision from the weak model. This is achieved by incorporating a knowledge distillation (KD) loss. For each input, the strong model's output is compared to the weak model's output, and the resulting KD loss is used to update the strong model's parameters. This approach allows the strong model to learn dynamically from the weak supervisor's behavior.

Image 0

0

1

Updated 2026-05-02

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related