Essay

Choosing and Explaining a PEFT Strategy Under Deployment Constraints

Your company maintains a single, frozen 30B-parameter Transformer model behind a shared inference service. You must ship 12 task-specific adaptations (e.g., contract clause extraction, customer-email triage, internal policy Q&A) to different product teams. Constraints: (1) you cannot store or deploy 12 full model copies; (2) the inference service team will not accept per-task changes that require modifying the model’s internal layer code paths, but they will allow per-request changes to the input embeddings; (3) tasks are stable for months, but product teams occasionally request small behavior tweaks that must be delivered within a day.

Write an essay recommending an adaptation approach that fits these constraints, explicitly comparing prompt tuning vs prefix fine-tuning as PEFT methods that use continuous (soft) prompts. In your justification, explain (a) where the trainable vectors live and how they are applied at inference time, (b) what “input composition” looks like in prefix tuning at a Transformer layer (i.e., how trainable prefix vectors and previous-layer hidden states form the layer input), and (c) the practical trade-offs you are accepting regarding deployment complexity, storage per task, and speed of making small updates.

Image 0

0

1

Updated 2026-02-06

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Ch.4 Alignment - Foundations of Large Language Models

Data Science

Related