Analyzing Period Scaling Effects
A language model developer is adapting a model originally trained with a maximum sequence length of 4096 tokens to now work with a maximum sequence length of only 2048 tokens. They use the standard period scaling formula for position interpolation: , where is the new sequence length and is the original maximum length. Analyze the effect of this change on the periods of the positional encoding functions. Will the periods increase or decrease, and by what factor? Explain your reasoning based on the formula.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model was originally trained with a maximum sequence length of 2048 tokens. To handle longer documents, its positional encodings are being adjusted to accommodate a new sequence length of 8192 tokens by scaling the periods of the encoding functions. If the original period for a specific dimension was 100, what is the new, adjusted period for that same dimension after this adjustment?
Analyzing Period Scaling Effects
Diagnosing an Error in Positional Encoding Scaling