Learn Before
An AI engineering team is adapting a language model to handle longer text inputs. The model was originally trained with a maximum sequence length of 4096 tokens and uses an embedding dimensionality of 128. To extend the model's context window to 16384 tokens, they must apply a scaling factor (λ) to the base of its rotary positional embeddings. Using the formula λ = (m / m_l)^(d / (d - 2)), where m is the new sequence length, m_l is the original length, and d is the dimensionality, what is the correct scaling factor to apply?
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An AI engineering team is adapting a language model to handle longer text inputs. The model was originally trained with a maximum sequence length of 4096 tokens and uses an embedding dimensionality of 128. To extend the model's context window to 16384 tokens, they must apply a scaling factor (λ) to the base of its rotary positional embeddings. Using the formula
λ = (m / m_l)^(d / (d - 2)), wheremis the new sequence length,m_lis the original length, anddis the dimensionality, what is the correct scaling factor to apply?Impact of Embedding Dimensionality on RoPE Scaling
Influence of Dimensionality on RoPE Scaling Factor