Learn Before
Neural Network as a Memory Component
The memory component, , in an attention mechanism can be implemented as a neural network. This network functions recurrently, where at each step, it updates its state by taking its own previous output (the prior memory state) and the current states of the main model as inputs to generate the new memory output.

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Neural Network as a Memory Component
Segment-Level Recurrence for Memory Models
A memory-based attention mechanism updates its fixed-size memory state,
Mem, at each time stepiusing a general recurrent formula:Mem_new = f((k_i, v_i), Mem_old), where(k_i, v_i)is the current key-value pair andMem_oldis the memory state from the previous step. Which of the following update procedures does NOT conform to this recurrent structure?Calculating a Recurrent Memory State
Consider a memory update process defined by the recurrent function
Mem_new = f((k_i, v_i), Mem_old), where(k_i, v_i)is the input at the current step andMem_oldis the memory state from the previous step. To compute the memory state for step 100, this process requires direct access to the individual key-value pairs from all 99 preceding steps (i.e., from step 1 to 99).Formula for Memory as a Cumulative Average of Keys and Values
Learn After
Formula for Neural Network Memory Update
A computational model is designed to process a sequence of items one by one. To keep a running summary of the sequence, it uses a specific neural network as a memory component. At each step, this network updates its internal state. Suppose at step
t=5, the memory network has just produced an output representing its state, which we'll callMem_prior. The main model has also processed the fifth item in the sequence, resulting in a current state representation calledS_current. To generate the new memory state for the next step, what inputs should be fed into the memory network?An attention mechanism uses a neural network to maintain a memory of the information it has processed. Arrange the following events in the correct chronological order for a single update step of this memory component.
An engineer is developing a text summarization model that processes a document sentence by sentence. The model uses a special neural network as a memory component to keep track of the document's overall context. The engineer observes that the model generates excellent summaries for short articles but produces incoherent summaries for long articles, often forgetting information from the initial paragraphs. The main model components responsible for processing individual sentences are confirmed to be working correctly. Based on this observation, which of the following is the most likely malfunction within the memory component's update process?