Suppose you have built a neural network. You decide to initialize the weights and biases to be zero. Which of the following statements is true?

For a neural network,  you shouldn't initialize weights to parameters to all zero.
Why is that?
Let's take an example:
you have two input features, x1 and x2, and two hidden units. So the matrix of this layer's parameter W will be a 2 by 2 matrix. 
If we initialize W to be full of 0, as it's shown in the image, a1=a2, and dz1=dz2 and these two neurons will compute exactly the same thing. And no matter how many layers you have, these two neurons will have same outputs.

To solve this problem, we can initialize our parameters by using 
np.random.randint((m,n))*0.01
multiple by 0.01 is to initlize a small number.



University of Michigan - Ann Arbor

• The main steps for building a Neural Network are:

     – Define the model structure (such as number of input features and outputs)

     – Initialize the model’s weights and biases.

     – Loop.

           ∗ Calculate current loss (forward propagation)

           ∗ Calculate current gradient (backward propagation)

           ∗ Update parameters (gradient descent)

• Preprocessing the dataset is important.

• Tuning the learning rate (which is an example of a “hyperparameter”) can make a big difference to the algorithm.

Main Step to build a NN

https://www.coursera.org/learn/neural-networks-deep-learning?specialization=deep-learning

Neural Networks and Deep Learning

Random Initialization

A set number of nodes are initialized to nonzero values, while the rest are set to zero. This may result in a longer time to shrink overly large values, but can increase diversity among the units at initialization time,

Sparse Initialization

Xavier initialization can also be adapted for sampling weights from a uniform distribution instead of a Gaussian one. A uniform distribution $$U(-a, a)$$ has a variance of $$\frac{a^2}{3}$$. By setting this equal to the Xavier variance condition $$\sigma^2 = \frac{2}{n_	extrm{in} + n_	extrm{out}}$$ and solving for $$a$$, we obtain $$a = \sqrt{\frac{6}{n_	extrm{in} + n_	extrm{out}}}$$. Therefore, the uniform version of Xavier initialization samples weights according to the distribution $$U\left(-\sqrt{\frac{6}{n_	extrm{in} + n_	extrm{out}}}, \sqrt{\frac{6}{n_	extrm{in} + n_	extrm{out}}}ight)$$.

Learn Before

Related

Learn After