1Cademy - Definition of Teachers Probability Distribution (Pt) in Knowledge Distillation

Learn Before

Teacher-Student Model Architecture in Knowledge Distillation

Formula

Definition of Teacher's Probability Distribution (Pt) in Knowledge Distillation

In the context of knowledge distillation, $P_t$ represents the teacher model's output probability distribution. It is formally defined as a conditional probability, $P_t = \text{Pr}^t(\cdot|\mathbf{c}, \mathbf{z})$ , which gives the probability of an output given a context $\mathbf{c}$ and a latent variable $\mathbf{z}$ .