1Cademy - Create a Self-Improving Prompt System with Ensemble Gating and Evolutionary Search

Learn Before

Case Study

Create a Self-Improving Prompt System with Ensemble Gating and Evolutionary Search

You own an internal LLM feature that drafts first-pass responses to employee IT helpdesk tickets. The feature must (a) keep average latency under 2.5 seconds, (b) keep average cost under $0.02 per ticket, and (c) maintain high reliability across weekly model version updates. You have a labeled evaluation set of 2,000 historical tickets with gold categories and a lightweight automatic grader that is correct ~90% of the time (it is noisy but cheap). You can afford at most 50,000 total LLM calls per week for optimization and monitoring.

Design an end-to-end automated prompt design system that (1) treats prompt optimization explicitly as a search problem, (2) uses an iterative LLM-based prompt search loop (evaluation → pruning → expansion), (3) incorporates an evolutionary computation component (e.g., mutation/crossover) to generate novel prompt candidates, and (4) deploys prompt ensembling in production with a clear aggregation method and a rule for when to run 1 prompt vs multiple prompts to stay within latency/cost.

Your design must be concrete: specify the search space you will explore (what parts of the prompt can change), the search strategy (including stopping conditions), the performance estimation approach (how you will use the noisy grader and any human spot-checking), how evolutionary operators will be applied within the iterative loop, and how the ensemble will be constructed and aggregated (e.g., majority vote, weighted vote, or another method) including how weights/gating are learned from evaluation data. Provide enough detail that an engineer could implement the workflow and explain the key tradeoffs you are making between exploration vs exploitation and reliability vs cost/latency.

Updated 2026-02-06

Contributors are:

Who are from:

Learn Before

Related