Learn Before
Short Answer

Restoring Representational Power in Normalization

An AI researcher is working with a Transformer model and has implemented a normalization step for the activations within a specific layer. This step first calculates the mean and standard deviation across all features for a single training example's activation vector, then uses these values to transform the vector to have a mean of 0 and a standard deviation of 1. However, they observe that this strict normalization is hindering the model's ability to learn effectively. Which components should be introduced into the normalization formula to allow the network to learn an optimal scale and shift for the normalized activations, thereby potentially recovering lost representational power? Explain the role of each component.

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Data Science

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

Related