Learn Before
Using Reference Tokens to Define a Vocabulary Distribution in k-NN LM
A prevalent strategy for utilizing the retrieved reference tokens in k-NN Language Models involves creating a new probability distribution over the vocabulary. This distribution is derived from the nearest neighbors and serves to guide the model's final prediction by incorporating context from the datastore.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Using Reference Tokens to Define a Vocabulary Distribution in k-NN LM
Role of Internal State in Datastore Search
A language model enhanced with a nearest-neighbor mechanism needs to find relevant information from its external datastore to help predict the next word. Arrange the following steps in the correct chronological order to describe how the model retrieves this information.
A language model enhanced with a nearest-neighbor search mechanism is generating text. The model's current internal state, representing the prefix 'The scientist made a groundbreaking...', is used as a query to search an external datastore. The datastore contains pairs of (context representation, associated word). If the search retrieves the three words 'discovery', 'advance', and 'finding' as reference tokens, which statement most accurately describes how these specific words were selected?
Learn After
Aggregated Distance Calculation for k-NN Vocabulary Distribution
Linear Interpolation of k-NN and LLM Distributions
Characterizing a Retrieval-Based Probability Distribution
A k-Nearest Neighbors Language Model (k-NN LM) is generating text and needs to predict the next token. It queries its datastore and retrieves the 5 nearest reference tokens, along with their corresponding distances: {"river": 0.1}, {"stream": 0.2}, {"river": 0.3}, {"ocean": 0.8}, {"river": 0.9}. How are these retrieved tokens and their distances used to construct a new probability distribution over the model's vocabulary?
Evaluating a k-NN LM's Intermediate Output