Learn Before
Cost-Based Stopping Criteria
Decoding in LLMs can be terminated based on real-world costs, such as limits on computational resources or time. This approach is particularly valuable in time-sensitive applications, like real-time chatbots, where a response must be generated within a specific time frame to ensure user responsiveness.
0
1
Tags
Ch.5 Inference - Foundations of Large Language Models
Foundations of Large Language Models
Foundations of Large Language Models Course
Computing Sciences
Related
End-of-Sequence (EOS) Token as a Stopping Criterion
Sequence Count as a Stopping Criterion in Beam Search
Maximum Output Length as a Stopping Criterion
Cost-Based Stopping Criteria
Behavior-Based Stopping Criteria
Debugging an Uncontrolled Text Generation System
A developer is testing a new text-generation system. They find that when prompted, the system produces a relevant initial response but then continues to generate a long, rambling stream of unrelated text until it is manually interrupted. What is the most fundamental problem with the system's configuration that leads to this behavior?
Consequences of Unbounded Text Generation
Learn After
Evaluating Stopping Criteria for a Time-Sensitive Application
An engineering team is deploying a large language model for a live customer support chatbot. The primary business requirement is to ensure that no user waits more than two seconds for an initial response, even if it means the response is slightly incomplete. Which of the following rules for ending the text generation process is best aligned with this requirement?
Trade-offs in Cost-Based Text Generation