Learn Before
Concept
Recent work
Teacher and student model setup are almost pre-fixed in size and structure during distillation, causing a model capacity gap. It is missing a way to design the architectures and an explanation for why their architectures are determined by these model setups.
0
1
Updated 2022-10-29
Tags
Deep Learning (in Machine learning)