Concept

Pruning for BERT Compression

Pruning is a technique for compressing BERT by strategically removing parts of its Transformer network. This can be implemented in several ways, such as eliminating entire layers, removing a certain percentage of network parameters, or discarding specific attention heads. These actions can significantly speed up model inference, often without a major decrease in performance.

0

1

Updated 2026-04-17

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences