1Cademy - An engineer trains a sequence processing model on a dataset where the longest text is 512 tokens. The model performs well on texts up to this length. However, when tested on a 1000-token document, the models output becomes incoherent for the latter half of the text. A visualization of the numerical signals used to represent token positions shows a clear, repeating pattern for the first 512 positions, but a chaotic, noisy pattern for all positions thereafter. What is the most likely explanation for this specific failure mode?

Learn Before

Positional Encoding without Generalization

Multiple Choice

An engineer trains a sequence processing model on a dataset where the longest text is 512 tokens. The model performs well on texts up to this length. However, when tested on a 1000-token document, the model's output becomes incoherent for the latter half of the text. A visualization of the numerical signals used to represent token positions shows a clear, repeating pattern for the first 512 positions, but a chaotic, noisy pattern for all positions thereafter. What is the most likely explanation for this specific failure mode?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related