Learn Before
Dynamic Networks for Efficient BERT Inference
Dynamic networks offer a strategy for making deep models like BERT more efficient during inference by adaptively altering the computation based on the input. This paradigm includes methods like depth-adaptive models, which dynamically select an optimal number of layers for processing a token and then skip the remainder of the layer stack. Another example is length-adaptive models, where the input sequence length is adjusted by skipping less important tokens to reduce the computational burden.
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Learn After
Depth-Adaptive BERT Models
Length-Adaptive BERT Models
A team of engineers is tasked with optimizing a large language model for real-time text summarization of news articles. They observe that the model's processing time is a major bottleneck. To address this, they implement a mechanism that, for each article, dynamically decides to skip processing certain less-informative sentences entirely, thereby reducing the total amount of text fed through the model's most computationally expensive components. Which principle of efficient model inference does this approach best exemplify?
Match each description of an efficiency technique for language models with the type of dynamic network it represents.
Optimizing a Language Model for Varied Task Complexity