Learn Before
Formula for Scaling the Period in Position Interpolation
To adjust the period of the sine and cosine functions in Rotary Positional Embeddings for longer sequences, one approach is to scale up the period by a factor of . The adjusted period is given by: where is the new sequence length, is the original sequence length, and is the base.

0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.2 Generative Models - Foundations of Large Language Models
Related
Formula for Scaling the Period in Position Interpolation
A language model was originally trained to handle text up to a maximum of 4096 tokens. To enable it to process a document with 8192 tokens without retraining, a modification is made to its positional encoding functions. Based on the principles of position interpolation, which statement best describes the nature and effect of this modification?
Analyzing a Positional Encoding Modification
Mechanism of Position Interpolation
To enable a language model to process sequences longer than its original training limit, the period of its positional encoding functions must be reduced. This adjustment ensures that the new, more distant positions are mapped within the range the model has already learned.
Learn After
A language model was originally trained with a maximum sequence length of 2048 tokens. To handle longer documents, its positional encodings are being adjusted to accommodate a new sequence length of 8192 tokens by scaling the periods of the encoding functions. If the original period for a specific dimension was 100, what is the new, adjusted period for that same dimension after this adjustment?
Analyzing Period Scaling Effects
Diagnosing an Error in Positional Encoding Scaling