The bound is defined as: $\mathcal{L}(v,\theta,q)=log p(v;\theta)-D_{KL} (q(h|v)||p(h|v;\theta))$,
where $q$ is an arbitrary probability distribution over $h$. $\mathcal{L}$ always has at most the same value as the desired log-probability, since the difference between $log p(v)$ and $\mathcal{L}(v,\theta,q)$ is given by the KL divergence, which is always nonnegative. The two are equal if and only if $q$ is the same distribution as $p(h|v)$. $\mathcal{L}$ can be rearranged through algebra into the simpler form $\mathcal{L}(v,\theta,q)=\mathbb{E}_{h\sim q}[log p(h,v)]+H(q)$. Thus, we can think of inference as the procedure for finding the $q$ that maximizes $\mathcal{L}$.


University of Michigan - Ann Arbor

Exact inference can be described as an optimization problem. Thus, approximate inference algorithms can be derived by approximating the underlying optimization. We assume a probabilistic model consisting of observed variables $v$ and latent variables $h$. Sometimes it is too difficult to compute the log-probability ($log p(v;\theta)$) of the observed data, so we can instead compute a lower bound $\mathcal{L}(v, \theta,q)$ on $log p(v;\theta)$, called the evidence lower bound (ELBO), or the negative variational free energy. 

Inference as an Optimization Problem

Goodfellow, I., Bengio, Y., & Courville, A. (2016). $\mathit{Deep \ Learning.}$ MIT Press. Retrieved from [www.deeplearningbook.org](https://www.deeplearningbook.org) 

Learn Before

Related