Short Answer

Constructing the Input Hidden State for a Prefix-Tuned Layer

In a transformer model adapted with prefix vectors, consider the input to layer l+1. The prefix for this layer consists of 10 trainable vectors. The hidden states corresponding to the original text input, which were processed by the previous layer, form a sequence of 512 vectors. Each vector in the model has a dimension of 768. Describe the structure of the complete hidden state sequence that is fed into the self-attention mechanism of layer l+1, and state its final dimensions.

0

1

Updated 2025-10-08

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Application in Bloom's Taxonomy

Cognitive Psychology

Psychology

Social Science

Empirical Science

Science