Configuring Memory Component Weights
A team is developing a language model that uses a memory component to keep track of recent information. The memory is calculated as a pair of summary vectors using the following weighted moving average formula over the last n_c key (k) and value (v) vectors at position i:
Mem = ( (Σ_{j=i-n_c+1}^{i} β_{j-i+n_c} k_j) / (Σ_{j=1}^{n_c} β_j), (Σ_{j=i-n_c+1}^{i} β_{j-i+n_c} v_j) / (Σ_{j=1}^{n_c} β_j) )
The team's goal is for the model to assign greater importance to the most recent information within its memory window compared to older information. Based on the structure of this formula, describe the characteristic that the weight vector β = [β_1, β_2, ..., β_{n_c}] should have to achieve this goal, and explain your reasoning.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A model computes a memory component,
Mem, using the following formula for a weighted moving average of the lastn_ckey (k) and value (v) vectors at a given positioni:Mem = ( (Σ_{j=i-n_c+1}^{i} β_{j-i+n_c} k_j) / (Σ_{j=1}^{n_c} β_j), (Σ_{j=i-n_c+1}^{i} β_{j-i+n_c} v_j) / (Σ_{j=1}^{n_c} β_j) )Given a current position
i=10, a context window sizen_c=4, and weightsβ = [β_1, β_2, β_3, β_4], which of the following expressions correctly represents the calculation for the summary key vector?Configuring Memory Component Weights
Calculating the Memory Summary Vector