A language model's positional embeddings are being adapted for a new context length. The adaptation uses a scaling factor λ = 0.25, a base term b = 10000, and an embedding dimension d = 128. Based on the formula for the scaled frequency parameters, θ' = [ (λb)^(-0/d), (λb)^(-2/d), ..., (λb)^(-(d-2)/d) ], what is the approximate value of the second component in the θ' vector?
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model's positional embeddings are being adapted for a new context length. The adaptation uses a scaling factor
λ = 0.25, a base termb = 10000, and an embedding dimensiond = 128. Based on the formula for the scaled frequency parameters,θ' = [ (λb)^(-0/d), (λb)^(-2/d), ..., (λb)^(-(d-2)/d) ], what is the approximate value of the second component in theθ'vector?Evaluating RoPE Scaling for Context Extension
Impact of Scaling Factor on RoPE Frequencies