Problem

Computational Infeasibility of Full Output Summation in Distillation Loss

The direct application of the cross-entropy loss function for knowledge distillation is often computationally impractical. This is because the formula requires a summation over the entire set of possible outputs, which can be exponentially large, making the calculation infeasible in many real-world scenarios.

0

1

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Related