1Cademy - Shared Weight and Shared Activation Methods

Learn Before

Computational Cost of Self-Attention in Transformers

Concept

Shared Weight and Shared Activation Methods

Shared weight and shared activation methods are a category of optimization techniques that have been extensively used in neural network architectures like Transformers. These methods involve reusing model parameters (weights) or intermediate representations (activations) across different components, such as layers, with the goal of enhancing parameter efficiency and reducing the overall model size.