1Cademy - A compact computational model is being trained to replicate the probabilistic outputs of a large, established reference model. The training process aims to minimize the dissimilarity between the two models full output distributions for any given input. Below are the output probability distributions from the reference model and three potential outputs from the compact model for the same input. **Reference Model Output:** `[0.70, 0.20, 0.10]` Which of the compact model outputs below demonstrates the most successful replication of the reference models output distribution, considering the goal is to match the entire distribution, not just the most likely outcome? **Compact Model - Output A:** `[0.65, 0.22, 0.13]` **Compact Model - Output B:** `[0.70, 0.10, 0.20]` **Compact Model

Learn Before

Using KL Divergence for Knowledge Distillation Loss

Multiple Choice

A compact computational model is being trained to replicate the probabilistic outputs of a large, established reference model. The training process aims to minimize the dissimilarity between the two models' full output distributions for any given input. Below are the output probability distributions from the reference model and three potential outputs from the compact model for the same input.

Reference Model Output: [0.70, 0.20, 0.10]

Which of the compact model outputs below demonstrates the most successful replication of the reference model's output distribution, considering the goal is to match the entire distribution, not just the most likely outcome?

Compact Model - Output A: [0.65, 0.22, 0.13] Compact Model - Output B: [0.70, 0.10, 0.20] Compact Model - Output C: [0.50, 0.30, 0.20]

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related