Multiple Choice

A language model, originally trained with a maximum sequence length of 2048, is being adapted to handle a new maximum length of 8192 using linear positional interpolation. This technique modifies the rotary position embedding function (Ro) by scaling the position index. The new function, Ro', for a token x_i at position i in the longer sequence, is equivalent to applying the original function Ro with a scaled positional argument: Ro'(x_i, iθ) = Ro(x_i, (original_max_length / new_max_length) * iθ). For a token at position i = 4096 in the new, extended context, what is the scaled positional argument that would be passed to the original Ro function?

0

1

Updated 2025-09-29

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science