Concept

Subsequent Masking (Deep Attentive Study Session Dropout Prediction in Mobile Learning Environment)

In the model unlike to the original transformer model masking is applied to all multi-head attention layers. As at this problem only information from previous interactions are needed -> e1,ene_1, \cdots e_n and l1,,ln1l_1, \cdots, l_{n-1}, that's why masking is used for all the layers.

0

1

Updated 2021-01-15

Tags

Data Science