Formula

Definition of Teacher's Probability Distribution (Pt) in Knowledge Distillation

In the context of knowledge distillation, PtP_t represents the teacher model's output probability distribution. It is formally defined as a conditional probability, Pt=Prt(c,z)P_t = \text{Pr}^t(\cdot|\mathbf{c}, \mathbf{z}), which gives the probability of an output given a context c\mathbf{c} and a latent variable z\mathbf{z}.

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences