Impact of Scaling Factor on Normalization
Consider the formula for the normalization factor: Describe what happens to the value of (Z(\mathbf{x})) as the scaling factor (\beta) becomes very large (approaches infinity), assuming the rewards (r(\mathbf{x}, \mathbf{y})) are not all zero. Explain your reasoning.
0
1
Tags
Ch.4 Alignment - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Reward-Weighted Probability Distribution
Consider a scenario where for a given input (\mathbf{x}), there are only two possible outputs, (\mathbf{y}_1) and (\mathbf{y}2). A reference model assigns probabilities (\pi{\text{ref}}(\mathbf{y}1|\mathbf{x}) = 0.6) and (\pi{\text{ref}}(\mathbf{y}_2|\mathbf{x}) = 0.4). A reward function gives scores (r(\mathbf{x}, \mathbf{y}1) = 2) and (r(\mathbf{x}, \mathbf{y}2) = 1). Assuming the scaling factor (\beta) is 1, what is the value of the normalization factor (Z(\mathbf{x})), which is calculated as (Z(\mathbf{x}) = \sum{\mathbf{y}} \pi{\text{ref}}(\mathbf{y}|\mathbf{x}) \exp(r(\mathbf{x}, \mathbf{y})))?
Consider the calculation of a normalization factor using the formula: If the reward function (r(\mathbf{x}, \mathbf{y})) consistently returns a value of 0 for all possible outputs (\mathbf{y}), the normalization factor (Z(\mathbf{x})) will always be equal to 1.
Impact of Scaling Factor on Normalization