Concept

Cross-Layer Parameter Sharing in BERT

A technique to reduce the size of BERT models is to share parameters across its multiple layers. This can be implemented by having a single Transformer layer's parameters reused throughout the entire layer stack. This approach not only decreases the total number of unique parameters but also reduces the memory footprint during inference.

0

1

Updated 2026-04-18

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences