Learn Before
An engineering team needs to make a large language model more efficient for deployment. They are considering two distinct compression methods. Match each method with its corresponding description.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Pruning for BERT Compression
Quantization for BERT Compression
A development team is working to optimize a large, pre-trained language model for a real-time translation application. The model's current inference speed is too slow. They are considering two strategies: (1) removing a specific number of attention heads from each layer, or (2) representing all model parameters with lower-precision numbers. Which statement best distinguishes the primary impact of these two compression techniques in this context?
BERT Compression Strategy for Mobile Deployment
An engineering team needs to make a large language model more efficient for deployment. They are considering two distinct compression methods. Match each method with its corresponding description.