Learn Before
Distributed Attention Calculation Scenario
An attention mechanism's output for a single query is being calculated across two separate computational nodes. Given the partial sets of value vectors (v) and their corresponding attention weights (α) below, what is the final aggregated output vector?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Computing Sciences
Foundations of Large Language Models Course
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Distributed Attention Calculation Scenario
A large-scale model's attention mechanism computes its output by partitioning the value vectors across multiple computational nodes. Each node calculates a partial weighted sum using its local subset of value vectors. Which statement best analyzes the relationship between the partial sums and the final attention output?
A large-scale model needs to compute a final output vector, which is defined as a weighted sum of many different value vectors. To speed up this calculation, the set of value vectors is split across multiple computational nodes. Arrange the following steps in the correct chronological order to describe this distributed computation process.