Multiple Choice

Two engineers are modifying a language model's Rotary Positional Embeddings (RoPE) to handle longer text sequences.

  • Engineer A proposes modifying the core RoPE transformation function itself (creating a new function, Ro') while keeping the original positional angles (θ) the same.
  • Engineer B proposes keeping the original RoPE transformation function (Ro) unchanged but applying it to a new, scaled set of positional angles (θ').

To ensure that the relative positional information is preserved correctly during this context extension, a key condition must be met: the outcome of the new system must be equivalent to the outcome of the original system applied to scaled positions. Based on this principle, which engineer's approach is more theoretically sound, and why?

0

1

Updated 2025-10-03

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Evaluation in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science