Learn Before
Short Answer

Diagnosing an Input Vector Mismatch

An NLP team is adapting a pre-trained language model that uses a 768-dimensional vector space for its internal representations. To incorporate new information, they generate a separate 100-dimensional feature vector for each token. They attempt to combine these by directly summing the 100-dimensional vector with the model's 768-dimensional input vector for each token. The model fails to train. What is the fundamental mathematical reason for this failure, and what is the standard method to correctly integrate the new feature vector?

0

1

Updated 2025-10-07

Contributors are:

Who are from:

Tags

Ch.1 Pre-training - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science

Related