Linear Combination of Local and External Attention
When incorporating both a local memory, , and a retrieved long-term memory, , into a language model, one architectural approach is to process them in separate attention steps. As exemplified by the model developed by Wu et al. (2021), the outputs from the local attention mechanism and the external -NN attention mechanism can then be linearly combined to produce the final representation.
0
1
Tags
Foundations of Large Language Models
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Combined KV Cache for k-NN and Local Memory
k-NN Search Augmented Attention
Optimizing Attention for a Specialized Chatbot
A team is designing a language model for a legal chatbot. The model must be able to follow the immediate flow of a user's query while also referencing specific, relevant legal precedents from a massive, static database. Which of the following approaches for the model's attention mechanism best addresses this dual requirement?
Diagnosing Memory Deficiencies in a Chatbot
Linear Combination of Local and External Attention