1Cademy - Definition of Students Probability Distribution ($P

Learn Before

Model Parameterization by θ
Teacher-Student Model Architecture in Knowledge Distillation

Formula

Definition of Student's Probability Distribution ( $P_\theta^s$ )

In the context of knowledge distillation, $P_\theta^s$ denotes the probability distribution of the student model's output. This distribution is conditional on a given context $\mathbf{c}'$ and a latent variable $\mathbf{z}$ , and is parameterized by the student model's weights $\theta$ . The relationship is formally expressed by the equation: $P_\theta^s = \text{Pr}_\theta^s(\cdot|\mathbf{c}', \mathbf{z})$ .