Multiple Choice

An AI engineering team is adapting a language model to handle longer text inputs. The model was originally trained with a maximum sequence length of 4096 tokens and uses an embedding dimensionality of 128. To extend the model's context window to 16384 tokens, they must apply a scaling factor (λ) to the base of its rotary positional embeddings. Using the formula λ = (m / m_l)^(d / (d - 2)), where m is the new sequence length, m_l is the original length, and d is the dimensionality, what is the correct scaling factor to apply?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science