Example

Visual Diagram of Combined Loss Training for Weak-to-Strong Generalization

This diagram illustrates a training process for a large model using a combined loss objective, a technique used in weak-to-strong generalization. In this setup, a large model takes an input 'x' from a dataset and produces an output 'y'. The model is trained by minimizing two separate loss functions simultaneously: 1) a standard Language Model (LM) Loss, which compares the model's output to ground-truth data, and 2) a Knowledge Distillation (KD) Loss, which is derived from a smaller, weaker 'teacher' model. These losses are combined in the 'Compute Loss & Train' step to update the large model's parameters.

Image 0

0

1

Updated 2025-10-09

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences