Essay

Stabilizing an LLM Feature Under Drift Using Search, Ensembling, and Evolutionary Optimization

You own an internal LLM feature that extracts structured fields ("customer_intent", "urgency", "product") from short, messy chat messages. After a model version update, accuracy becomes unstable: some prompt wordings still work well, but performance varies widely by customer segment and week-to-week data drift. You are allowed to make API calls to the LLM, but you cannot fine-tune the model, and you must be able to explain your approach to auditors (i.e., you need a reproducible process and clear evaluation criteria).

Write an essay proposing an automated prompt design approach that (1) frames prompt optimization explicitly as a search problem (define the search space, search strategy, and performance estimation), (2) uses an iterative LLM-based prompt search loop (evaluation → pruning → expansion) to discover improved prompts over time, (3) incorporates an evolutionary computation mechanism (e.g., selection, mutation, and/or crossover) to generate novel prompt candidates, and (4) uses prompt ensembling to reduce sensitivity to wording and drift.

In your proposal, justify key tradeoffs and interactions among these components—for example, how ensembling changes what you optimize for during search, how evolutionary operators affect exploration vs. exploitation in the iterative loop, and how your performance estimation method avoids overfitting to a narrow validation set while remaining auditable. Conclude with concrete stopping conditions and what artifact(s) you would deploy (single prompt vs. ensemble, and how you would update it).

0

1

Updated 2026-02-06

Contributors are:

Who are from:

Tags

Ch.3 Prompting - Foundations of Large Language Models

Foundations of Large Language Models

Foundations of Large Language Models Course

Computing Sciences

Data Science

Related