Debugging a Permutation-Invariant Model
A data scientist is debugging a text-processing model that uses a stack of self-attention layers. They observe that the model produces the exact same output for the sentences 'The delivery truck blocked the driveway' and 'The driveway blocked the delivery truck'. The scientist confirms that the input words are correctly converted into their initial vector representations before being passed to the attention layers. Based on this information, identify the most likely missing component and explain where it should be added in the model's architecture to fix this issue.
0
1
Tags
Data Science
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Consider a language model that uses a standard self-attention mechanism but lacks any method for encoding word positions. The model is given two distinct input sentences:
Sentence 1: 'A dog chases a cat.' Sentence 2: 'A cat chases a dog.'
After these sentences pass through a single self-attention layer, how would the final output representation for the word 'chases' compare between the two sentences?
An engineer is building a translation model. The core of the model is a mechanism that, for each word, computes a new representation by taking a weighted sum of all other words in the sentence. The engineer observes that the model produces the exact same internal representation for the phrases 'the old man's car' and 'the man's old car'. What is the most probable reason for this behavior?
Debugging a Permutation-Invariant Model