Learn Before
Concept

Learning Rate Dilemma for Sparse Features

When training models on sparse features, using a standard decreasing learning rate, such as O(t12)\mathcal{O}(t^{-\frac{1}{2}}), creates an optimization dilemma. If the learning rate decreases too quickly, the parameters for infrequent features will not be updated sufficiently to reach their optimal values when they finally appear. Conversely, if the learning rate decreases too slowly to accommodate these infrequent features, the parameters for common features will fail to converge quickly.

0

1

Updated 2026-05-15

Contributors are:

Who are from:

Tags

D2L

Dive into Deep Learning @ D2L