Learn Before
Concept
Adaptive Computation Time (ACT) in transformers
An intriguing and promising modification is to make computation time conditioned on the inputs, i.e., to introduce Adaptive Computation Time (ACT), as opposed to the fixed computation procedure used in vanilla transformers. This allows for a deeper and more refined representation for complex inputs, and a shallow, more efficient representation for easier inputs

0
1
Updated 2022-05-26
Tags
Data Science