Learn Before
Formula
Feature Count-Based Learning Rate Adjustment
To address the learning rate dilemma for sparse features, one approach is to adjust the learning rate based on feature occurrence. Instead of a global time-based decay , a feature-specific rate can be used, where counts the number of nonzeros for feature observed up to time . However, this method fails for data that is not strictly sparse but instead has gradients that are mostly very small and only rarely large, as it is difficult to define a clear threshold for counting a feature as observed.
0
1
Updated 2026-05-15
Tags
D2L
Dive into Deep Learning @ D2L