1Cademy - A machine learning engineer is comparing two normalization functions for a neural network layer. The input is a vector `h`, and `ε` is a small constant for numerical stability. Function A: `output = gain * ((h - mean(h)) / (std_dev(h) + ε)) + bias` Function B: `output = gain * (h / (root_mean_square(h) + ε)) + bias` What is the primary consequence of Function B omitting the subtraction of the inputs mean (`- mean(h)`), a step which is present in Function A?

Learn Before

RMS Layer Normalization Formula

Multiple Choice

A machine learning engineer is comparing two normalization functions for a neural network layer. The input is a vector h, and ε is a small constant for numerical stability.

Function A: output = gain * ((h - mean(h)) / (std_dev(h) + ε)) + bias Function B: output = gain * (h / (root_mean_square(h) + ε)) + bias

What is the primary consequence of Function B omitting the subtraction of the input's mean (- mean(h)), a step which is present in Function A?

Updated 2025-10-04

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

An input vector h = [1, 5, 7] is passed through a normalization layer. The layer computes the output using the formula α * (h / (sqrt(mean(h^2)) + ε)) + β. Given a learnable gain parameter α = 1.5, a learnable bias parameter β = 0.5, and a numerically stabilizing constant ε that is small enough to be ignored in this calculation, what is the resulting output vector?
A machine learning engineer is comparing two normalization functions for a neural network layer. The input is a vector h, and ε is a small constant for numerical stability.

Function A: output = gain * ((h - mean(h)) / (std_dev(h) + ε)) + bias Function B: output = gain * (h / (root_mean_square(h) + ε)) + bias

What is the primary consequence of Function B omitting the subtraction of the input's mean (- mean(h)), a step which is present in Function A?
Debugging RMS Layer Normalization Output

Learn Before

Related