Adapting a Language Model for Longer Documents
An AI development team has a language model that was trained exclusively on documents with a maximum length of 4096 tokens. When they try to use this model for summarizing new, longer documents of up to 16384 tokens, they observe a significant drop in quality. The model seems to lose coherence and disregard information from the beginning of the longer documents. The team cannot afford to retrain the model from scratch. Based on this scenario, explain the fundamental issue with how the model is processing the positions of tokens in the longer documents. Then, describe the core mechanism of a technique that could resolve this issue by re-mapping the token positions.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Implementing Linear Scaling by Modifying Embedding Model Input
A language model was originally developed to process text sequences with a maximum length of 2048 positions. To enable it to handle a longer input sequence of 8192 positions, a technique is applied that linearly scales down the new position indices to fit within the model's original learned range. Given this scenario, what would be the scaled-down position index that corresponds to the token at position 6144 in the new, longer sequence?
Adapting a Language Model for Longer Documents
Calculating Scaled Positional Indices