Failure Prediction Is a Better Performance Proxy for Early-Exit Networks Than Calibration
Abstract
Early-exit models accelerate inference by attaching internal classifiers to intermediate layers of the network, allowing computation to halt once a prediction meets a predefined exit criterion. Most early-exit methods rely on confidence-based exit strategies, which has motivated prior work to calibrate intermediate classifiers in pursuit of improved performance-efficiency trade-offs. In this paper, we argue that calibration metrics can be misleading indicators of multi-exit model performance. Specifically, we present empirical evidence showing that miscalibrated networks can outperform calibrated ones. As an alternative, we propose using failure prediction as a more informative proxy for early-exit model performance. Unlike calibration, failure prediction captures changes in sample rankings and correlates strongly with efficiency gains, offering a more reliable framework for designing and evaluating early-exit models.
Publication
In Structured Probabilistic Inference and Generative Modeling Workshop, NeurIPS 2025