Learn Before
A machine learning engineer is comparing two normalization functions for a neural network layer. The input is a vector h, and ε is a small constant for numerical stability.
Function A: output = gain * ((h - mean(h)) / (std_dev(h) + ε)) + bias
Function B: output = gain * (h / (root_mean_square(h) + ε)) + bias
What is the primary consequence of Function B omitting the subtraction of the input's mean (- mean(h)), a step which is present in Function A?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An input vector
h = [1, 5, 7]is passed through a normalization layer. The layer computes the output using the formulaα * (h / (sqrt(mean(h^2)) + ε)) + β. Given a learnable gain parameterα = 1.5, a learnable bias parameterβ = 0.5, and a numerically stabilizing constantεthat is small enough to be ignored in this calculation, what is the resulting output vector?A machine learning engineer is comparing two normalization functions for a neural network layer. The input is a vector
h, andεis a small constant for numerical stability.Function A:
output = gain * ((h - mean(h)) / (std_dev(h) + ε)) + biasFunction B:output = gain * (h / (root_mean_square(h) + ε)) + biasWhat is the primary consequence of Function B omitting the subtraction of the input's mean (
- mean(h)), a step which is present in Function A?Debugging RMS Layer Normalization Output