Learn Before
Formula for Memory as a Weighted Moving Average of Keys and Values
The memory component, , can be computed using a weighted version of the moving average of the last key and value vectors. The weights, denoted by , are applied to each key-value pair. This calculation is formally expressed as:

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Formula for Memory as a Weighted Moving Average of Keys and Values
Increasing Coefficients as a Heuristic for Weighted Moving Average
A language model's memory component creates a summary vector of past information using a weighted moving average. The weights are determined by a heuristic that assigns significantly higher importance to more recent information. For a task like summarizing a long, complex article, what is the most probable impact of this specific weighting scheme on the model's output?
Learned vs. Heuristic Weights for Memory Summarization
Configuring Memory for Narrative Coherence
Learn After
A model computes a memory component,
Mem, using the following formula for a weighted moving average of the lastn_ckey (k) and value (v) vectors at a given positioni:Mem = ( (Σ_{j=i-n_c+1}^{i} β_{j-i+n_c} k_j) / (Σ_{j=1}^{n_c} β_j), (Σ_{j=i-n_c+1}^{i} β_{j-i+n_c} v_j) / (Σ_{j=1}^{n_c} β_j) )Given a current position
i=10, a context window sizen_c=4, and weightsβ = [β_1, β_2, β_3, β_4], which of the following expressions correctly represents the calculation for the summary key vector?Configuring Memory Component Weights
Calculating the Memory Summary Vector