Research paper is accepted by Decision Support Systems.
Title: Reliability Estimation for Individual Predictions in Machine Learning Systems: A Model Reliability-Based Approach
Authors: Xiaoge Zhang and Indranil Bose
Abstract: The conventional aggregated performance measure (i.e., mean squared error) with respect to the whole dataset would not provide desired safety and quality assurance for each individual prediction made by a machine learning model in risk-sensitive regression problems. In this paper, we propose an informative indicator $\mathcal{R} \left(\bm{x} \right)$ to quantify model reliability for individual prediction (MRIP) for the purpose of safeguarding the usage of machine learning (ML) models in mission-critical applications. Specifically, we define the reliability of a ML model with respect to its prediction on each individual input $\bm{x}$ as the probability of the observed difference between the prediction of ML model and the actual observation falling within a small interval when the input $\bm{x}$ varies within a small range subject to a preset distance constraint, namely $\mathcal{R}(\bm{x}) = P(|y^* – \hat{y}^*| \leq \varepsilon | \bm{x}^* \in B(\bm{x}))$, where $y^*$ denotes the observed target value for the input $\bm{x}^*$, $\hat{y}^*$ denotes the model prediction for the input $\bm{x}^*$, and $\bm{x}^*$ is an input in the neighborhood of $\bm{x}$ subject to the constraint $B\left( \bm{x} \right) = \left\{ {\left. {{\bm{x}^*}} \right|\left\| {{\bm{x}^*} – \bm{x}} \right\| \le \delta } \right\}$. The developed MRIP indicator $\mathcal{R} \left(\bm{x} \right)$ provides a direct, objective, quantitative, and general-purpose measure of “reliability” or the probability of success of the ML model for each individual prediction by fully exploiting the local information associated with the input $\bm{x}$ and ML model. Next, to mitigate the intensive computational effort involved in MRIP estimation, we develop a two-stage ML-based framework to directly learn the relationship between $\bm{x}$ and its MRIP $\mathcal{R} \left( \bm{x} \right)$, thus enabling to provide the reliability estimate $\mathcal{R} \left( \bm{x} \right)$ for any unseen input instantly. Thirdly, we propose an information gain-based approach to help determine a threshold value pertaing to $\mathcal{R} \left( \bm{x} \right)$ in support of decision makings on when to accept or abstain from counting on the ML model prediction. Comprehensive computational experiments and quantitative comparisons with existing methods on a broad range of real-world datasets reveal that the developed ML-based framework for MRIP estimation shows a robust performance in improving the reliability estimate of individual prediction, and the MRIP indicator $\mathcal{R} \left( \bm{x} \right)$ thus provides an essential layer of safety net when adopting ML models in risk-sensitive environments.