Multiple Choice

A language model was trained exclusively on text sequences with a maximum length of 1024 tokens. When presented with a 2048-token sequence, two different approaches are considered for generating positional information for the new, unseen positions (1024 to 2047).

Approach X: The mechanism generates values for the new positions by continuing the mathematical pattern it learned from the original 0-1023 positions.

Approach Y: The mechanism rescales the positional indices of the entire 2048-token sequence so that they all map to values within the original 0-1023 range.

Which statement correctly categorizes these two approaches?

0

1

Updated 2025-09-26

Contributors are:

Who are from:

Tags

Ch.2 Generative Models - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Analysis in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science