Multiple Choice

A development team is working to optimize a large, pre-trained language model for a real-time translation application. The model's current inference speed is too slow. They are considering two strategies: (1) removing a specific number of attention heads from each layer, or (2) representing all model parameters with lower-precision numbers. Which statement best distinguishes the primary impact of these two compression techniques in this context?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science