1Cademy - A language model, originally trained with a maximum sequence length of 2048, is being adapted to handle a new maximum length of 8192 using linear positional interpolation. This technique modifies the rotary position embedding function (`Ro`) by scaling the position index. The new function, `Ro`, for a token `x_i` at position `i` in the longer sequence, is equivalent to applying the original function `Ro` with a scaled positional argument: `Ro(x_i, iθ) = Ro(x_i, (original_max_length / new_max_length) * iθ)`. For a token at position `i = 4096` in the new, extended context, what is the scaled positional argument that would be passed to the original `Ro` function?

Learn Before

Formula for RoPE with Linear Positional Interpolation

Multiple Choice

A language model, originally trained with a maximum sequence length of 2048, is being adapted to handle a new maximum length of 8192 using linear positional interpolation. This technique modifies the rotary position embedding function (Ro) by scaling the position index. The new function, Ro', for a token x_i at position i in the longer sequence, is equivalent to applying the original function Ro with a scaled positional argument: Ro'(x_i, iθ) = Ro(x_i, (original_max_length / new_max_length) * iθ). For a token at position i = 4096 in the new, extended context, what is the scaled positional argument that would be passed to the original Ro function?

0

1

Updated 2025-09-29

Contributors are:

Who are from:

Learn Before

Related