Research paper accepted by Decision Support Systems
The conventional aggregated performance measure (i.e., mean squared error) with respect to the whole dataset would not provide desired safety and quality assurance for each individual prediction made by a machine learning model in risk-sensitive regression problems. In this paper, we propose an informative indicator $\mathcal{R} \left(\bm{x} \right)$ to quantify model reliability for individual prediction (MRIP) for the purpose of safeguarding the usage of machine learning (ML) models in mission-critical applications. Specifically, we define the reliability of a ML model with respect to its prediction on each individual input $\bm{x}$ as the probability of the observed difference between the prediction of ML model and the actual observation falling within a small interval when the input $\bm{x}$ varies within a small range subject to a preset distance constraint, namely $\mathcal{R}(\bm{x}) = P(|y^* – \hat{y}^*| \leq \varepsilon | \bm{x}^* \in B(\bm{x}))$, where $y^*$ denotes the observed target value for the input $\bm{x}^*$, $\hat{y}^*$ denotes the model prediction for the input $\bm{x}^*$, and $\bm{x}^*$ is an input in the neighborhood of $\bm{x}$ subject to the constraint $B\left( \bm{x} \right) = \left\{ {\left. {{\bm{x}^*}} \right|\left\| {{\bm{x}^*} – \bm{x}} \right\| \le \delta } \right\}$. The developed MRIP indicator $\mathcal{R} \left(\bm{x} \right)$ provides a direct, objective, quantitative, and general-purpose measure of “reliability” or the probability of success of the ML model for each individual prediction by fully exploiting the local information associated with the input $\bm{x}$ and ML model. Next, to mitigate the intensive computational effort involved in MRIP estimation, we develop a two-stage ML-based framework to directly learn the relationship between $\bm{x}$ and its MRIP $\mathcal{R} \left( \bm{x} \right)$, thus enabling to provide the reliability estimate $\mathcal{R} \left( \bm{x} \right)$ for any unseen input instantly. Thirdly, we propose an information gain-based approach to help determine a threshold value pertaing to $\mathcal{R} \left( \bm{x} \right)$ in support of decision makings on when to accept or abstain from counting on the ML model prediction. Comprehensive computational experiments and quantitative comparisons with existing methods on a broad range of real-world datasets reveal that the developed ML-based framework for MRIP estimation shows a robust performance in improving the reliability estimate of individual prediction, and the MRIP indicator $\mathcal{R} \left( \bm{x} \right)$ thus provides an essential layer of safety net when adopting ML models in risk-sensitive environments.