1Cademy - Log-Likelihood Gradient

Learn Before

Maximum Likelihood Estimation
The Partition Function (introduction).

Relation

Log-Likelihood Gradient

The gradient of the log-likelihood which is used in Maximum Likelihood Estimation can be decomposed into the following: $\nabla_{\boldsymbol \theta} \log p(\mathbf{x}; \boldsymbol \theta) = \nabla_{\boldsymbol \theta} \log \tilde{p}(\mathbf{x}; \boldsymbol \theta) - \nabla_{\boldsymbol \theta } \log Z(\boldsymbol \theta)$ Where $\tilde{p}(\mathbf{x}; \boldsymbol \theta)$ is the unnormalized probabilioty density and $Z$ is the partition function. This is well-known as the decomposition into the positive phase and negative phase of learning. Due to the reliance of the partition function on the parameters, learning models by maximum likelihood is particularly difficult.