Analyzing Long-Range Consistency in Language Models
Analyze the likely reason for the difference in performance between Model A and Model B. Specifically, explain how the architectural feature of Model B allows it to avoid the plot inconsistency seen in Model A's output, even when the relevant information is far outside the standard processing window.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
k-NN Memory Retrieval
Integrating k-NN Memory with Local Memory in Attention
Populating a k-NN Datastore for Language Modeling
Equivalence Between k-NN and Sparse Attention Models
k-NN Language Modeling (k-NN LM)
Vector Database
A language model is designed to be a question-answering assistant for a large corporate knowledge base containing thousands of separate project documents. A user asks a question about 'Project Alpha,' but the most relevant technical detail needed to answer it is located in a document for 'Project Zeta,' a completely unrelated past project. Which statement best explains the unique advantage of using a k-nearest neighbors (k-NN) based external memory system in this scenario?
Analyzing Long-Range Consistency in Language Models
In a k-NN based external memory system, the datastore of key-value pairs is limited to representing only the context states from the current, single sequence being processed.