1Cademy - An AI engineering team is adapting a language model to handle longer text inputs. The model was originally trained with a maximum sequence length of 4096 tokens and uses an embedding dimensionality of 128. To extend the models context window to 16384 tokens, they must apply a scaling factor (λ) to the base of its rotary positional embeddings. Using the formula `λ = (m / m_l)^(d / (d - 2))`, where `m` is the new sequence length, `m_l` is the original length, and `d` is the dimensionality, what is the correct scaling factor to apply?

Learn Before

Solution for RoPE Base Scaling Factor (λ)

Multiple Choice

An AI engineering team is adapting a language model to handle longer text inputs. The model was originally trained with a maximum sequence length of 4096 tokens and uses an embedding dimensionality of 128. To extend the model's context window to 16384 tokens, they must apply a scaling factor (λ) to the base of its rotary positional embeddings. Using the formula λ = (m / m_l)^(d / (d - 2)), where m is the new sequence length, m_l is the original length, and d is the dimensionality, what is the correct scaling factor to apply?

Updated 2025-09-26

Contributors are:

Who are from:

Learn Before

Related