Learn Before
Case Study

Debugging a Sentence-Pair Model

An engineer is training a model to perform a sentence-pair task, such as determining if one sentence logically follows another. The input to the model is a single sequence created by concatenating the two sentences. The engineer observes that the model is struggling to learn the relationship between the sentences. Upon inspection, they find that each token's input vector is created by summing only two components: a vector for the token's identity and a vector for the token's position in the sequence.

Based on this information, what critical piece of information is missing from the token representations, and why does its absence hinder the model's ability to understand the relationship between the two sentences?

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science