A language model, originally trained with a maximum sequence length of 2048, is being adapted to handle a new maximum length of 8192 using linear positional interpolation. This technique modifies the rotary position embedding function (Ro) by scaling the position index. The new function, Ro', for a token x_i at position i in the longer sequence, is equivalent to applying the original function Ro with a scaled positional argument: Ro'(x_i, iθ) = Ro(x_i, (original_max_length / new_max_length) * iθ). For a token at position i = 4096 in the new, extended context, what is the scaled positional argument that would be passed to the original Ro function?
0
1
Tags
Ch.3 Prompting - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model, originally trained with a maximum sequence length of 2048, is being adapted to handle a new maximum length of 8192 using linear positional interpolation. This technique modifies the rotary position embedding function (
Ro) by scaling the position index. The new function,Ro', for a tokenx_iat positioniin the longer sequence, is equivalent to applying the original functionRowith a scaled positional argument:Ro'(x_i, iθ) = Ro(x_i, (original_max_length / new_max_length) * iθ). For a token at positioni = 4096in the new, extended context, what is the scaled positional argument that would be passed to the originalRofunction?Adapting a Language Model for Longer Sequences
A language model's context window is being extended from an original maximum length
m_lto a new, larger maximum lengthm. The technique used modifies the rotary position embedding function (Ro) by scaling the position indexiaccording to the formula:New Effective Position = (m_l / m) * i. This formula implies that the effective position for a token at the very end of the new, extended context (positionm-1) is mapped to a position that falls outside the range of the original model's trained positions, which is [0,m_l-1].