An engineer is building a translation model. The core of the model is a mechanism that, for each word, computes a new representation by taking a weighted sum of all other words in the sentence. The engineer observes that the model produces the exact same internal representation for the phrases 'the old man's car' and 'the man's old car'. What is the most probable reason for this behavior?
0
1
Tags
Data Science
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Consider a language model that uses a standard self-attention mechanism but lacks any method for encoding word positions. The model is given two distinct input sentences:
Sentence 1: 'A dog chases a cat.' Sentence 2: 'A cat chases a dog.'
After these sentences pass through a single self-attention layer, how would the final output representation for the word 'chases' compare between the two sentences?
An engineer is building a translation model. The core of the model is a mechanism that, for each word, computes a new representation by taking a weighted sum of all other words in the sentence. The engineer observes that the model produces the exact same internal representation for the phrases 'the old man's car' and 'the man's old car'. What is the most probable reason for this behavior?
Debugging a Permutation-Invariant Model