Learn Before
Debugging a Sentence-Pair Model
An engineer is training a model to perform a sentence-pair task, such as determining if one sentence logically follows another. The input to the model is a single sequence created by concatenating the two sentences. The engineer observes that the model is struggling to learn the relationship between the sentences. Upon inspection, they find that each token's input vector is created by summing only two components: a vector for the token's identity and a vector for the token's position in the sequence.
Based on this information, what critical piece of information is missing from the token representations, and why does its absence hinder the model's ability to understand the relationship between the two sentences?
0
1
Tags
Ch.1 Pre-training - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Example of Input Embedding Composition for a Sentence Pair
A model processes a two-sentence input: 'The sky is blue. [SEP] Grass is green.'. To help the model distinguish between the two sentences, it uses a specific vector,
Vec_A, for the first sentence and another vector,Vec_B, for the second. How are these vectors assigned to the tokens in the combined input sequence?Debugging a Sentence-Pair Model
A model is given a two-sentence input: 'What is the capital of France? [SEP] Paris is the capital.'. The model uses one vector representation for the first sentence (let's call it
Vec_A) and a different one for the second sentence (Vec_B). For the tokenized sequence below, what is the correct sequence of these vector labels that would be assigned to each token?[CLS] What is the capital of France ? [SEP] Paris is the capital . [SEP]The correct sequence is: ____. (Use a comma and space to separate labels, e.g.,
Label1, Label2)