1Cademy - Stabilizing a Polarity Classifier When Migrating from BERT to Prompt-Completion

Learn Before

Case Study

Stabilizing a Polarity Classifier When Migrating from BERT to Prompt-Completion

You own a production sentiment (polarity) classifier for customer chat transcripts with the required label set {positive, neutral, negative}. The current system is a fine-tuned BERT single-text classifier that takes one transcript at a time and outputs a probability distribution over the three labels. To reduce serving cost, a team proposes switching to an LLM using classification via prompt completion (text generation) and then mapping the generated text to one of the three labels.

After a pilot, you observe the following on the same 1,000 transcripts:

BERT outputs are always one of {positive, neutral, negative}.
The LLM often generates completions like: (a) "positive", (b) "mostly positive", (c) "mixed feelings", (d) "not great", (e) "the customer is satisfied overall", (f) "neutral/unclear".
Your current label-mapping rule is: if the generated text contains the substring "positive" → positive; else if it contains "negative" or "not" → negative; else if it contains "neutral" → neutral; else → neutral.
Business stakeholders report a spike in escalations because many "mixed feelings" and "neutral/unclear" cases are being treated as negative in downstream workflows.

As the responsible ML lead, propose a revised end-to-end classification design (prompt + label mapping approach, and whether/where you would keep or replace the BERT classifier) that reduces these misroutes while still meeting the requirement that the final output is exactly one of {positive, neutral, negative}. In your answer, explicitly explain how your design uses prompt-completion behavior and label mapping to control outputs, and how it compares to the BERT single-text classification approach in terms of reliability of label production.

Updated 2026-02-06

Contributors are:

Who are from:

Learn Before

Related