Multiple Choice

An input vector to a neural network layer consists of elements that are all large positive values. This vector is processed by two different normalization techniques. Technique A first calculates the average of the elements and subtracts it from each element, then scales the result. Technique B bypasses the subtraction step and only scales the elements based on their root mean square magnitude. Which statement best describes the fundamental difference between the output vectors produced by these two techniques?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

Related