Research paper is accepted by Knowledge-Based Systems.
Title: Continuous Optimization for Construction of Neural Network-Based Prediction Intervals
Authors: Long Xue, Kai Zhou, Xiaoge Zhang
Abstract: Principled quantification of predictive uncertainty in neural networks (NNs) is essential to safeguard their applications in high-stakes decision settings. In this paper, we develop a differentiable mathematical formulation to quantify the uncertainty in NN prediction using prediction intervals (PIs). The formulated optimization problem is differentiable and compatible with the built-in gradient descent optimizers in prevailing deep learning platforms, and two performance metrics composed of prediction interval coverage probability (PICP) and mean prediction interval width (MPIW) are considered in the construction of PIs. Different from existing methods, the developed methodology features four salient characteristics. Firstly, we design two distance-based functions that are differentiable to impose constraints associated with the target coverage in PI construction, where PICP is prioritized explicitly over MPIW in the devised composite loss function. Next, we adopt a shared-bottom NN architecture with intermediate layers to separate the learning of shared and task-specific feature representations along the construction of lower and upper bounds. Thirdly, we leverage the projection of conflicting gradients (PCGrad) to mitigate interference of gradients associated with the two individual learning tasks so as to increase the convergence stability and solution quality. Finally, we design a customized early stopping mechanism to monitor PICP and MPIW simultaneously for the purpose of selecting the set of parameters that not only meets the target coverage but also has a minimal MPIW as the ultimate NN parameters. A broad range of datasets are used to rigorously examine the performance of the developed methodology. Computational results suggest that the developed method significantly outperforms the classic LUBE method across the nine datasets by reducing the PI width by 31.26% on average. More importantly, it achieves competitive results compared to the other three state-of-the-art methods by outperforming them on four out of ten datasets. An ablation study is used to explicitly demonstrate the benefit of shared-bottom NN architecture in the construction of PIs.