Input Embedding with Positional Encoding
To encode positional context into a sequence, a positional embedding is combined with the token embedding. For a token at position , given its position-independent token embedding and its positional embedding , the final input representation is calculated by adding them together: .

0
1
Tags
Ch.2 Generative Models - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
Self-Attention layer understanding - Step 5 - Adding the time
Input Embedding with Positional Encoding
Learnable Absolute Positional Embeddings
Initial Input Representation for Transformer Layers
Comparison of Arbitrary Order Prediction and Masked Language Modeling
An engineer builds a language model where all input words in a sentence are processed simultaneously and independently before their information is combined. When testing the model with the sentences 'The cat chased the dog' and 'The dog chased the cat', the engineer observes that the model generates identical internal representations for both, failing to capture their different meanings. Which of the following modifications would most directly address this fundamental flaw?
Model Architecture Design Choice
Analyzing Order-Insensitivity in Language Models
Learn After
A researcher is training a sequence-processing model and observes that while it correctly identifies the meaning of individual words, it consistently fails on tasks where word order is crucial. For example, it treats 'dog bites man' and 'man bites dog' as having the same overall meaning. The researcher suspects an issue in how the initial input vectors are constructed for the model. What is the most probable cause of this issue?
Constructing an Input Vector for a Sequence Model
Calculating a Combined Input Vector
Learnable Absolute Positional Embeddings