Learn Before
Kullback-Leibler Divergence
Kullback-Leibler (KL) divergence, also known as relative entropy, measures how one probability distribution diverges from a second, reference probability distribution. For discrete probability distributions and defined on the same probability space, the KL divergence from to , denoted , is the expectation of the logarithmic difference between the probabilities given by the two distributions, where the expectation is taken using the probabilities of . The formula is: KL divergence is non-negative () and is zero if and only if and are identical. It is an asymmetric measure, meaning that is generally not equal to .
0
1
Tags
Data Science
Foundations of Large Language Models Course
Computing Sciences
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Related
Relationship between KL Divergence and MLE
Cross-entropy loss
Mean Squared Error
The property of consistency of maximum likelihood
Statistical Efficiency Principal of MLE
Maximum Likelihood Estimator Properties
Log-Likelihood Gradient
Maximum Likelihood Training Objective for a Dataset of Sequences
Kullback-Leibler Divergence
Model Selection via Likelihood
Training Objective as Loss Minimization over a Dataset
Mathematical Equivalence of General and Sequential MLE Objectives
A researcher is modeling a series of coin flips. They observe the following sequence of outcomes: Heads, Tails, Heads, Heads. The researcher wants to find the best parameter for their model, where the parameter represents the probability of the coin landing on Heads. According to the principle of maximum likelihood estimation, which of the following parameter values best explains the observed data?
Parameter Estimation via Conditional Log-Likelihood Maximization
Equivalence of Maximizing Likelihood and Minimizing Loss
Equivalence of Squared Loss and Maximum Likelihood Estimation
Negative Log-Likelihood Objective for Softmax Regression
Learn After
Formula for Soft Prompt Optimization by Minimizing KL Divergence
Derivation of the KL Divergence Objective for Policy Optimization
A machine learning model produces a probability distribution Q over a set of outcomes, aiming to approximate a true data distribution P. During evaluation, you observe that the divergence measure is low, while the reverse measure is high. Based on these results, what is the most likely characteristic of the model's distribution Q?
Calculating Divergence Between Distributions
Choosing a Loss Function for Model Distillation