Essay

Diagnosing a PEFT Implementation Bug: Prompt Tuning vs Prefix Fine-Tuning

You are reviewing a teammate’s implementation of a parameter-efficient adaptation for a frozen Transformer-based LLM. They claim they implemented prefix fine-tuning using continuous (soft) prompts.

They describe their code as follows:

  • They create a trainable matrix P of shape (n, d_model) and prepend it to the token embeddings only once, before the first Transformer layer.
  • For every Transformer layer l, they pass the same hidden-state sequence length forward (no extra positions are added inside deeper layers).
  • They report that training updates only P, and the base model weights remain frozen.

A second teammate argues this is actually prompt tuning, not prefix fine-tuning, and that true prefix fine-tuning would change the input composition inside each layer.

Write an analysis that (1) determines who is correct, (2) explains the key architectural/mechanistic difference using the idea of continuous prompts and the layer-wise input composition (i.e., how H^l is formed when prefixes are used), and (3) proposes a concrete fix to make the implementation match prefix fine-tuning while still remaining parameter-efficient (describe what must be introduced per layer and where it is concatenated).

Image 0

0

1

Updated 2026-02-06

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Data Science

Related