Learn Before
In a simple self-attention mechanism where similarity is measured by dot product and weights are normalized by a softmax function, if a current input vector x_i is perfectly orthogonal to a preceding input vector x_j, then x_j will have zero influence on the final output vector y_i.
0
1
Tags
Data Science
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Parameter Matrices for Attention Transformations
Introduce weight matrices in the transformer
Calculating an Output Vector in a Simple Sequence Model
In a simple self-attention mechanism where similarity is measured by dot product and weights are normalized by a softmax function, if a current input vector
x_iis perfectly orthogonal to a preceding input vectorx_j, thenx_jwill have zero influence on the final output vectory_i.You are calculating the output vector
y_ifor a single input vectorx_iin a sequence using a simple self-attention mechanism that only considers preceding elements. Arrange the following computational steps in the correct chronological order.