1Cademy - A large, complex language model is used to generate target probabilities for training a smaller, more efficient model. For the input sentence The cat sat on the ___, the large model could produce different probability distributions for the next word. Which of the following distributions, representing $P( ext{output} | ext{context})$, would provide the most informative and nuanced training signal for the smaller model?

Learn Before

Definition of Teacher's Probability Distribution (Pt) in Knowledge Distillation

Multiple Choice

A large, complex language model is used to generate target probabilities for training a smaller, more efficient model. For the input sentence 'The cat sat on the ___', the large model could produce different probability distributions for the next word. Which of the following distributions, representing $P( ext{output} | ext{context})$ , would provide the most informative and nuanced training signal for the smaller model?

Updated 2025-09-29

Contributors are:

Who are from:

Learn Before

Related