1Cademy - Optimizing a Student Models Training

Learn Before

Dynamic Adjustment of the Knowledge Distillation Coefficient (λ)

Case Study

Optimizing a Student Model's Training

Based on the scenario, what specific change to the weighting between the two training objectives would you recommend to improve the small model's final performance? Justify your recommendation by explaining how this change would affect the model's learning process over time.

Updated 2025-09-29

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Computing Sciences

Foundations of Large Language Models Course