1Cademy - Threshold Tuning in Cascading Systems

Learn Before

Cascading Models at Inference Time

Short Answer

Threshold Tuning in Cascading Systems

In a system that uses a small, fast model followed by a large, slow model only when necessary, a 'confidence threshold' is used to decide whether to accept the small model's output. Describe the two opposing risks a system designer must balance when setting this threshold.

Updated 2025-10-10

Contributors are: