Learn Before
  • Popular Regularization Techniques in Deep Learning

L2 Regularization (Weight Decay) in Deep Learning

L2 regularization, also known as weight decay, adds a term to the weight matrix that is equal to the sum of the squared weight values in the matrix, and is weighted by λ2m\frac{\lambda}{2m}. It is called weight decay because it penalizes growth of weights when minimizing the cost function. The prior distribution of L2 regularization is Gaussian distribution. J(w,b)=1mi=1mL(y^i,yi)+λ2mw22J(w, b) = \frac{1}{m}\sum_{i=1}^{m} \mathcal L (\hat{y}^{i}, y^{i}) + \frac{\lambda}{2m} ||w||_2^2

w[l]2=i=1n[l]j=1n[l1](wi,j[l])2||w^{[l]}||^2 = \sum_{i=1}^{n^{[l]}} \sum_{j=1}^{n^{[l - 1]}} (w_{i, j}^{[l]})^2 The row ii of the weight matrix correspond to the neurons in the current layer n[l]n^{[l]}, whereas the columns jj of the weight matrix correspond to the neurons in the previous layer n[l1]n^{[l-1]}.

0

2

4 years ago

References


Tags

Data Science

Related
  • Data Augmentation in Deep Learning

  • Early Stopping in Deep Learning

  • Dropout Regularization in Deep Learning

  • L2 Regularization (Weight Decay) in Deep Learning

  • Which of these techniques are useful for reducing variance (reducing overfitting)?

  • L1 Regularization in Deep Learning

  • ElasticNet Regression

  • If your Neural Network model seems to have high variance, what of the following would be promising things to try?

  • Regularization in ML and DL

  • Bagging in Deep Learning

  • Dropout in Deep Learning

  • Normalization of Data

  • Tangent Distance Algorithm

  • Tangent Propagation Algorithm

  • Manifold Tangent Classifier

  • Boosting in Deep Learning

  • Appropriate Regularization/ Representation

Learn After
  • Frobenius and L2

  • Ridge Regression

  • What is weight decay?

  • λ\lambda: Regularization Rate in Deep Learning

  • Gaussian (Normal) Distribution