A language model is given two token sequences: sequence x with 10 tokens (at positions 0 through 9) and sequence y with 5 tokens. To process them together, they are concatenated into a single sequence [x, y]. How is the initial input vector for the very first token of the original sequence y calculated before being passed to the first processing layer?
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Application in Bloom's Taxonomy
Cognitive Psychology
Psychology
Social Science
Empirical Science
Science
Related
A language model is given two token sequences: sequence
xwith 10 tokens (at positions 0 through 9) and sequenceywith 5 tokens. To process them together, they are concatenated into a single sequence[x, y]. How is the initial input vector for the very first token of the original sequenceycalculated before being passed to the first processing layer?Positional Context in Concatenated Sequences
Evaluating a Representation Generation Method