When randomly choosing values for hyperparameters to find the most optimal ones, sometimes, it's better to choose the random numbers based on specific distributions. For example, instead of choosing random numbers from a uniform distribution, it may be more reasonable to choose from a normal or logarithmic distribution. For choosing a value for:
- $0.0001 < \alpha < 1$, it's more reasonable to only try powers of 10 like $10^{-4}, ..., 10^0$.
- $0.9 < \beta < 0.9999$, it's more reasonable to only try numbers like 0.9, 0.99, 0.999, ..., 0.99999 because comparing 0.9000 and 0.9005, the latter results in ~10 more samples. On the other hand, 0.999 results in ~1000 samples but 0.9995 results in ~2000 samples.

University of Michigan - Ann Arbor

Grid Search may be very inefficient because it requires training the model for each of the combinations of values to calculate and compare the dev error.
A more efficient method is to take only random combinations from the grid and compare the dev error on those random combinations.

Learn Before

Related