An engineer is adapting a language model to process sequences twice as long as its original design (i.e., m = 2 * m_l). They use a method where the period of the highest frequency component in the new model is set equal to that of a linearly scaled model. This relationship is captured by the equation: Given that the embedding dimensionality d is greater than 2 and the original base b is a positive constant, how must the scaling factor λ change to satisfy this constraint for the new, longer sequence length?
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Solution for RoPE Base Scaling Factor (λ)
An engineer is adapting a language model to handle longer text sequences. The goal is to find a scaling factor,
λ, for the positional encoding base,b. The method involves setting the period of the highest frequency component in the new, adapted model equal to the period of a model scaled by linear interpolation. The dimensionality of the embeddings isd, the original sequence length ism_l, and the new sequence length ism. This constraint is captured by the following equation: Which part of this equation represents the period of the highest frequency dimension for the new model being developed?An engineer is adapting a language model to process sequences twice as long as its original design (i.e.,
m = 2 * m_l). They use a method where the period of the highest frequency component in the new model is set equal to that of a linearly scaled model. This relationship is captured by the equation: Given that the embedding dimensionalitydis greater than 2 and the original basebis a positive constant, how must the scaling factorλchange to satisfy this constraint for the new, longer sequence length?Evaluating a Proposed Simplification