Learn Before
Concept

Recent work

Teacher and student model setup are almost pre-fixed in size and structure during distillation, causing a model capacity gap. It is missing a way to design the architectures and an explanation for why their architectures are determined by these model setups.

0

1

Updated 2022-10-29

Tags

Deep Learning (in Machine learning)

Related