Learn Before
Tuning Positional Embeddings for Long-Context Models
An AI engineer is fine-tuning a language model to summarize very long legal documents. The model is underperforming, and the engineer suspects it's failing to capture relationships between pieces of information that are far apart in the text. The model uses positional embeddings based on the frequency parameter formula: To improve the model's ability to handle these long-range dependencies, the frequencies () need to correspond to longer periods. Which parameter in the formula (b or d) should the engineer adjust, and in which direction (increase or decrease), to achieve this? Justify your reasoning.
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Evaluation in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
Calculation of RoPE Frequency Parameters
Formula for the Period of RoPE's Sine and Cosine Components
Consider the generalized formula for calculating a set of frequency parameters: In this formula,
bis a configurable base greater than 1,dis the dimensionality (a positive integer), andkis the component index, which is an integer greater than 1. How would increasing the value of the basebaffect the calculated frequencyθ_kfor any givenkandd?Determining the Base from a Frequency Parameter
Tuning Positional Embeddings for Long-Context Models