Multiple Choice

A language model architecture is designed to predict the next token by using two parallel computational streams that originate from the same query vector. The first stream uses the immediate, local context to generate a probability distribution over the vocabulary. The second stream uses the query vector to search a large external datastore, find the most similar historical contexts, and generate a second probability distribution based on the tokens that followed those contexts. The two distributions are then combined to produce the final prediction. What is the primary functional distinction between the information provided by these two streams?

0

1

Updated 2025-09-28

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science