1Cademy - In the context of a reward-weighted probability distribution, defined as $$ \pi^{*}(\mathbf{y}|\mathbf{x}) = \frac{\pi_{\theta_{\text{ref}}}(\mathbf{y}|\mathbf{x}) \exp \left(\frac{1}{\beta}r(\mathbf{x}, \mathbf{y})\right)}{Z(\mathbf{x})} $$, consider a scenario where a specific output, $\mathbf{y}_A$, receives a very high reward, $r(\mathbf{x}, \mathbf{y}_A)$. However, the reference distribution assigns a probability to this output that is extremely close to zero, i.e., $\pi_{\theta_{\text{ref}}}(\mathbf{y}_A|\mathbf{x}) \approx 0$. What will be the approximate probability of $\mathbf{y}_A$ in the final distribution, $\pi^{*}(\mathbf{y}

Learn Before

Reward-Weighted Probability Distribution

Multiple Choice

In the context of a reward-weighted probability distribution, defined as $\pi^{*}(\mathbf{y}|\mathbf{x}) = \frac{\pi_{\theta_{\text{ref}}}(\mathbf{y}|\mathbf{x}) \exp \left(\frac{1}{\beta}r(\mathbf{x}, \mathbf{y})\right)}{Z(\mathbf{x})}$ , consider a scenario where a specific output, $\mathbf{y}_A$ , receives a very high reward, $r(\mathbf{x}, \mathbf{y}_A)$ . However, the reference distribution assigns a probability to this output that is extremely close to zero, i.e., $\pi_{\theta_{\text{ref}}}(\mathbf{y}_A|\mathbf{x}) \approx 0$ . What will be the approximate probability of $\mathbf{y}_A$ in the final distribution, $\pi^{*}(\mathbf{y}_A|\mathbf{x})$ ?

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Learn Before

Related