Concept
Subsequent Masking (Deep Attentive Study Session Dropout Prediction in Mobile Learning Environment)
In the model unlike to the original transformer model masking is applied to all multi-head attention layers. As at this problem only information from previous interactions are needed -> and , that's why masking is used for all the layers.
0
1
Updated 2021-01-15
Tags
Data Science