Learn Before
Compression of Pre-trained Models
Since pre-trained models (PTMs) usually consist of hundreds of millions of parameters, they are difficult to deploy in online services for real-world applications or on resource-restricted devices. Model compression is an approach used to reduce the model size and increase computational efficiency.
0
1
Contributors are:
Who are from:
Tags
Data Science
Foundations of Large Language Models Course
Computing Sciences
Learn After
Ways to compress PTMs
A development team has created a large, high-performance language model for a new smartphone application that provides real-time text summarization. During user testing, they observe that while the summaries are highly accurate, the application is slow to respond and causes the phone's battery to drain rapidly. Which of the following strategies would be the most appropriate first step to address these specific performance issues on the device?
Deployment Strategy for a New AI Assistant
Deployment Challenges of Large Models
For any real-world application, applying compression techniques to a large pre-trained model is the optimal deployment strategy because it reduces model size and improves computation efficiency without compromising the model's performance.