Learn Before
Log-Likelihood Gradient
What makes learning undirected models by maximum likelihood particularly difficult is that the partition function depends on the parameters:
∇θlog p(x; θ) = ∇θlog ˜p(x; θ) − ∇θlog Z(θ)
This identity is applicable only under certain regularity conditions :
∇θlog Z = Ex∼p(x)∇θlog ˜p(x)
The Monte Carlo approach to learning undirected models provides an intuitive framework in which we can consider both positive and negative phases
0
1
Tags
Data Science
Related
Pseudolikelihood
Log-Likelihood Gradient
Log-Likelihood Gradient
Normalizing Model Outputs
A model produces unnormalized scores for three possible outcomes: {Outcome A: 8, Outcome B: 10, Outcome C: 2}. To convert these scores into a valid probability distribution, a normalization constant must be calculated by summing all the unnormalized scores. What is the final, normalized probability for Outcome B?
Computational Cost of Normalization