Learn Before
Analysis of Period Scaling in Positional Embeddings
In a rotational positional embedding scheme, the frequency for each dimension i is determined by the formula θ_i = b^(-2i/d), where b is a constant base and d is the total number of dimensions. If this base b is scaled by a factor, explain why this results in a non-uniform change to the rotational periods across the different dimensions i.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A developer is adapting a pre-trained language model that uses rotational position embeddings to handle much longer input sequences. They achieve this by applying a scaling factor to the base
bused in the frequency calculations for the embeddings. Which statement best analyzes the impact of this change on the periods of the rotational frequencies across the different embedding dimensions?When the base
bused to calculate frequency parameters in Rotary Positional Embeddings is multiplied by a scaling factor, the periods associated with all dimensions of the embedding are scaled by an identical, uniform amount.Analysis of Period Scaling in Positional Embeddings