1Cademy - Cost-Based Stopping Criteria

Learn Before

Stopping Criteria in LLM Inference

Concept

Cost-Based Stopping Criteria

Decoding in LLMs can be terminated based on real-world costs, such as limits on computational resources or time. This approach is particularly valuable in time-sensitive applications, like real-time chatbots, where a response must be generated within a specific time frame to ensure user responsiveness.

Updated 2026-05-05

Contributors are:

Who are from:

References

Reference of Foundations of Large Language Models Course
Reference of Foundations of Large Language Models Course

Learn After

Evaluating Stopping Criteria for a Time-Sensitive Application
An engineering team is deploying a large language model for a live customer support chatbot. The primary business requirement is to ensure that no user waits more than two seconds for an initial response, even if it means the response is slightly incomplete. Which of the following rules for ending the text generation process is best aligned with this requirement?
Trade-offs in Cost-Based Text Generation

Learn Before

Related

Learn After