1Cademy - A language model architecture combines information from two sources: an immediate context output and a retrieved knowledge output. It uses a learned gating vector, `g`, to dynamically weigh these sources. The final output is calculated using the formula: `Output = g ⊙ [immediate_context_output] + (1 - g) ⊙ [retrieved_knowledge_output]`, where `⊙` is element-wise multiplication. If, during a specific task, the values in the gating vector `g` are consistently close to 0.0, what does this imply about the models behavior for that task?

Learn Before

Gated Combination of Local and k-NN Attention

Multiple Choice

A language model architecture combines information from two sources: an 'immediate context' output and a 'retrieved knowledge' output. It uses a learned gating vector, g, to dynamically weigh these sources. The final output is calculated using the formula: Output = g ⊙ [immediate_context_output] + (1 - g) ⊙ [retrieved_knowledge_output], where ⊙ is element-wise multiplication. If, during a specific task, the values in the gating vector g are consistently close to 0.0, what does this imply about the model's behavior for that task?

Updated 2025-09-29

Contributors are:

Who are from:

Learn Before

Related