Concept

Subsequent Masking (Deep Attentive Study Session Dropout Prediction in Mobile Learning Environment)

Unlike the original Transformer model, masking is applied to all multi-head attention layers in the proposed model. Because only information from previous interactions is needed for this problem (i.e., e1,,ene_1, \cdots, e_n and l1,,ln1l_1, \cdots, l_{n-1}), masking is used across all layers.

0

1

Updated 2026-05-17

Tags

Data Science