1Cademy - A large language model was originally trained with a maximum context window of 2048 tokens. You are now tasked with enabling it to process a sequence of 4096 tokens using a technique that scales the position indices of the longer sequence to fit within the models original learned range. How should the position index for the token at position 3072 in the 4096-token sequence be handled before being passed to the embedding layer?

Learn Before

Implementing Linear Scaling by Modifying Embedding Model Input

Multiple Choice

A large language model was originally trained with a maximum context window of 2048 tokens. You are now tasked with enabling it to process a sequence of 4096 tokens using a technique that scales the position indices of the longer sequence to fit within the model's original learned range. How should the position index for the token at position 3072 in the 4096-token sequence be handled before being passed to the embedding layer?

Updated 2025-10-03

Contributors are:

Who are from:

Learn Before

Related