Learn Before
An engineer is analyzing a model's architecture and notes that positional information is incorporated by applying a rotational transformation to the token embedding vectors. This transformation changes a vector's direction based on its position in the sequence but preserves its original length. Which statement correctly analyzes this technique in contrast to another common, non-learnable method?
0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Ch.3 Prompting - Foundations of Large Language Models
Analysis in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
An engineer is analyzing a model's architecture and notes that positional information is incorporated by applying a rotational transformation to the token embedding vectors. This transformation changes a vector's direction based on its position in the sequence but preserves its original length. Which statement correctly analyzes this technique in contrast to another common, non-learnable method?
A key distinction between two common non-learnable positional encoding methods is that one applies a multiplicative rotational transformation to token embeddings, while the other applies an additive operation by summing a positional vector with the token embeddings.
Match each operational description to the corresponding non-learnable positional embedding method.