Multiple Choice

An engineer trains a sequence processing model on a dataset where the longest text is 512 tokens. The model performs well on texts up to this length. However, when tested on a 1000-token document, the model's output becomes incoherent for the latter half of the text. A visualization of the numerical signals used to represent token positions shows a clear, repeating pattern for the first 512 positions, but a chaotic, noisy pattern for all positions thereafter. What is the most likely explanation for this specific failure mode?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science