Multiple Choice

In a self-attention mechanism, instead of directly comparing the raw input vectors of a sequence, each input vector is first multiplied by three separate, learned parameter matrices. This process creates three distinct representations of the original vector before they are used to calculate attention scores and output values. What is the primary analytical advantage of this approach over simply comparing the original input vectors to each other?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science