Computational Challenge of Large-Scale k-NN Datastores
A significant drawback of using an entire collection of sequences to populate a k-NN datastore is the high computational cost. As the number of sequences and corresponding key-value pairs in the datastore increases, the process of searching for nearest neighbors becomes computationally intensive.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Computational Challenge of Large-Scale k-NN Datastores
Datastore Composition in k-NN Language Models
Consider two language models that use an external datastore of (context -> next word) examples to help generate text.
- Model X populates its datastore only with examples from the specific document it is currently generating.
- Model Y's datastore is pre-filled with millions of examples from a vast and diverse library of texts before it begins generating any new document.
When asked to complete a sentence about a niche historical fact not mentioned earlier in the current document, which model is more likely to perform better and why?
Designing a Memory-Augmented Legal AI
Trade-offs in k-NN Datastore Population