Formula

Feature Count-Based Learning Rate Adjustment

To address the learning rate dilemma for sparse features, one approach is to adjust the learning rate based on feature occurrence. Instead of a global time-based decay η=η0t+c\eta = \frac{\eta_0}{\sqrt{t + c}}, a feature-specific rate ηi=η0s(i,t)+c\eta_i = \frac{\eta_0}{\sqrt{s(i, t) + c}} can be used, where s(i,t)s(i, t) counts the number of nonzeros for feature ii observed up to time tt. However, this method fails for data that is not strictly sparse but instead has gradients that are mostly very small and only rarely large, as it is difficult to define a clear threshold for counting a feature as observed.

0

1

Updated 2026-05-15

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L

Learn After