Learn Before
Impact of Embedding Dimensionality on RoPE Scaling
An AI research team is extending the context window for two different language models, Model A and Model B. Both models need to be scaled from an original length of 2048 tokens to a new length of 8192 tokens. The only difference between them is their embedding dimensionality (d):
- Model A:
d= 64 - Model B:
d= 512
The team will use the formula below to calculate the required scaling factor (λ):
Without performing the full calculation, predict which model will require a larger scaling factor (λ). Justify your reasoning by analyzing how the embedding dimensionality (d) influences the exponent in the formula.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An AI engineering team is adapting a language model to handle longer text inputs. The model was originally trained with a maximum sequence length of 4096 tokens and uses an embedding dimensionality of 128. To extend the model's context window to 16384 tokens, they must apply a scaling factor (λ) to the base of its rotary positional embeddings. Using the formula
λ = (m / m_l)^(d / (d - 2)), wheremis the new sequence length,m_lis the original length, anddis the dimensionality, what is the correct scaling factor to apply?Impact of Embedding Dimensionality on RoPE Scaling
Influence of Dimensionality on RoPE Scaling Factor