Short Answer

Analyzing the Asymmetry in Soft Prompt Optimization

Consider the formula for finding an optimal soft prompt, σ^\hat{\sigma}, by minimizing the difference between two probability distributions: σ^=argminσKL(Pr(c,z)Pr(σ,z))\hat{\sigma} = \underset{\sigma}{\arg\min}\, \text{KL}(\text{Pr}(\cdot|\mathbf{c}, \mathbf{z}) \|\| \text{Pr}(\cdot|\sigma, \mathbf{z})) In this formula, Pr(c,z)\text{Pr}(\cdot|\mathbf{c}, \mathbf{z}) is the probability distribution over possible outputs given a full context c\mathbf{c} and an input z\mathbf{z}, while Pr(σ,z)\text{Pr}(\cdot|\sigma, \mathbf{z}) is the distribution given a soft prompt σ\sigma and the same input z\mathbf{z}.

Explain why Pr(c,z)\text{Pr}(\cdot|\mathbf{c}, \mathbf{z}) is treated as the first argument (the 'true' distribution) and Pr(σ,z)\text{Pr}(\cdot|\sigma, \mathbf{z}) as the second argument within the KL divergence function, and not the other way around. What would be the conceptual implication of swapping their positions?

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.4 Alignment - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science