Debugging a Text-Pair Similarity Model
An engineer is building a model to predict a semantic similarity score between two sentences. They format the input as [CLS] Sentence A [SEP] Sentence B [SEP]. After the input passes through the main model layers, they take the final hidden states corresponding to all the tokens in 'Sentence A', average them to create a single vector, and feed this vector into a final prediction network to get the similarity score. Based on the standard architecture for this type of task, identify the primary flaw in the engineer's approach and explain why the standard method is preferred.
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A developer is building a model to determine if two sentences are paraphrases of each other. The model takes the two sentences as a single input, formatted with special tokens:
[CLS] Sentence A [SEP] Sentence B [SEP]. After processing, the model produces a final hidden-state vector for every token in this input sequence. To make the final classification decision (i.e., 'paraphrase' or 'not a paraphrase'), which vector should be passed to the final prediction network to best represent the relationship between the two sentences?Debugging a Text-Pair Similarity Model
You are using a large, pre-trained transformer model to perform a text-pair classification task, such as determining if a given sentence is a valid answer to a question. Arrange the following steps in the correct chronological order to describe the model's process from receiving the two text inputs to producing a final classification label.