1Cademy - Diagnosing an Input Vector Mismatch

Learn Before

Embedding Size in Transformer Models

Short Answer

Diagnosing an Input Vector Mismatch

An NLP team is adapting a pre-trained language model that uses a 768-dimensional vector space for its internal representations. To incorporate new information, they generate a separate 100-dimensional feature vector for each token. They attempt to combine these by directly summing the 100-dimensional vector with the model's 768-dimensional input vector for each token. The model fails to train. What is the fundamental mathematical reason for this failure, and what is the standard method to correctly integrate the new feature vector?

Updated 2025-10-07

Contributors are:

Who are from:

Learn Before

Related