Learn Before
Multiple Choice

An engineer is debugging a self-attention layer in a language model. They observe that the attention scores (the weights assigned to each input) are being calculated correctly. However, they decide to run an experiment where they keep the Query and Key vectors for each input token exactly the same, but they replace every Value vector with a vector of all zeros. Which of the following outcomes is the most certain result of this specific change?

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science