1Cademy - Analyzing the Asymmetry in Soft Prompt Optimization

Learn Before

Formula for Soft Prompt Optimization by Minimizing KL Divergence

Short Answer

Analyzing the Asymmetry in Soft Prompt Optimization

Consider the formula for finding an optimal soft prompt, $\hat{\sigma}$ , by minimizing the difference between two probability distributions: $\hat{\sigma} = \underset{\sigma}{\arg\min}\, \text{KL}(\text{Pr}(\cdot|\mathbf{c}, \mathbf{z}) \|\| \text{Pr}(\cdot|\sigma, \mathbf{z}))$ In this formula, $\text{Pr}(\cdot|\mathbf{c}, \mathbf{z})$ is the probability distribution over possible outputs given a full context $\mathbf{c}$ and an input $\mathbf{z}$ , while $\text{Pr}(\cdot|\sigma, \mathbf{z})$ is the distribution given a soft prompt $\sigma$ and the same input $\mathbf{z}$ .

Explain why $\text{Pr}(\cdot|\mathbf{c}, \mathbf{z})$ is treated as the first argument (the 'true' distribution) and $\text{Pr}(\cdot|\sigma, \mathbf{z})$ as the second argument within the KL divergence function, and not the other way around. What would be the conceptual implication of swapping their positions?

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related