Concept
Subsequent Masking (Deep Attentive Study Session Dropout Prediction in Mobile Learning Environment)
Unlike the original Transformer model, masking is applied to all multi-head attention layers in the proposed model. Because only information from previous interactions is needed for this problem (i.e., and ), masking is used across all layers.
0
1
Updated 2026-05-17
Contributors are:
Who are from:
Tags
Data Science