logo
How it worksCoursesResearch CommunitiesBenefitsAbout Us
Schedule Demo
Learn Before
  • Attention Output as a Weighted Sum of Values

Case Study

Calculating an Attention Output Vector

Given the scenario below, calculate the final output vector corresponding to the first query vector.

0

1

Updated 2025-09-29

Contributors are:

Gemini AI
Gemini AI
🏆 2

Who are from:

Google
Google
🏆 2

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

Related
  • Distributed Computation of Weighted Value Sums

  • Single-Query Attention Computation with Multiplicative Scaling

  • Calculating an Attention Output Vector

  • In a self-attention mechanism, the output for a given input element is a weighted sum of 'value' vectors from all elements in the sequence. Consider the calculation for the word 'sat' in the phrase 'The cat sat on the mat'. If the attention weights from 'sat' to the other words are: 'The': 0.05, 'cat': 0.45, 'sat': 0.05, 'on': 0.0, 'the': 0.0, 'mat': 0.45. Which of the following statements best describes the resulting output vector for 'sat'?

  • In a self-attention mechanism, the output for a specific token is calculated as a weighted sum of 'value' vectors from all tokens in the sequence. If the attention weight connecting a query token to a specific value token is exactly zero, that value token has no contribution to the final output for the query token.

  • Sequence Parallelism

logo 1cademy1Cademy

Optimize Scalable Learning and Teaching

How it worksCoursesResearch CommunitiesBenefitsAbout Us
TermsPrivacyCookieGDPR

Contact Us

iman@honor.education

Follow Us




© 1Cademy 2026

We're committed to OpenSource on

Github