1Cademy - A language model architecture is designed to predict the next token by using two parallel computational streams that originate from the same query vector. The first stream uses the immediate, local context to generate a probability distribution over the vocabulary. The second stream uses the query vector to search a large external datastore, find the most similar historical contexts, and generate a second probability distribution based on the tokens that followed those contexts. The two distributions are then combined to produce the final prediction. What is the primary functional distinction between the information provided by these two streams?

Learn Before

Inference Architecture of k-NN Language Models

Multiple Choice

A language model architecture is designed to predict the next token by using two parallel computational streams that originate from the same query vector. The first stream uses the immediate, local context to generate a probability distribution over the vocabulary. The second stream uses the query vector to search a large external datastore, find the most similar historical contexts, and generate a second probability distribution based on the tokens that followed those contexts. The two distributions are then combined to produce the final prediction. What is the primary functional distinction between the information provided by these two streams?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Learn Before

Related