Learn Before
In a sequence-to-sequence model, an attention mechanism calculates a score for three input vectors (A, B, and C) relative to a single output vector (D). The scoring function is the simple dot product between the output vector and each input vector. You are given the following geometric relationships:
- Vector A points in a very similar direction to Vector D.
- Vector B is orthogonal (at a 90-degree angle) to Vector D.
- Vector C points in the opposite direction of Vector D.
Which input vector will receive the highest attention score, and what is the underlying reason for this?
0
1
Tags
Data Science
Foundations of Large Language Models Course
Computing Sciences
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Example of Predicting Masked Words: Kitten Playing
Example of Masked Language Modeling: Kitten Chasing Ball
Example of Context-Based Prediction: Kitten Chasing Ball
In a sequence-to-sequence model, an attention mechanism calculates a score for three input vectors (A, B, and C) relative to a single output vector (D). The scoring function is the simple dot product between the output vector and each input vector. You are given the following geometric relationships:
- Vector A points in a very similar direction to Vector D.
- Vector B is orthogonal (at a 90-degree angle) to Vector D.
- Vector C points in the opposite direction of Vector D.
Which input vector will receive the highest attention score, and what is the underlying reason for this?
Evaluating Attention Mechanisms in Machine Translation
Calculating a Dot Attention Score