Research paper accepted by Reliability Engineering and Systems Safety

Although machine learning (ML) and deep learning (DL) methods are increasingly used for anomaly detection in industrial cyber-physical systems, their adoption is hindered by concerns about model trustworthiness, especially high false alarm rates (FARs). Excessive false alarms overwhelm operators, cause unnecessary shutdowns, and reduce operational efficiency. This study addresses these challenges by proposing a novel framework that integrates ML-based anomaly detectors with conformal prediction (CP), a model-agnostic uncertainty quantification technique. To handle distribution shifts in time-series data, our framework incorporates a temporal quantile adjustment method with a sliding calibration set, ensuring statistical guarantees on predefined FARs. A rejection mechanism is further integrated by excluding significant anomalies from the calibration set, improving detection capability while maintaining FAR guarantees. For real-time anomaly monitoring, two P-value-based indicators generated from CP are developed to track anomalous trends and enhance model interpretability. The framework is evaluated by comparing several baseline ML and DL methods to their conformalized counterparts using a public ICPS dataset. Comparative results based on Precision, Recall, F1, and AUROC validate the framework’s compatibility with various ML models and its effectiveness in improving anomaly detection performance by reducing false alarms and guaranteeing FARs across a range of predefined values.

Research paper accepted by IEEE Transactions on Systems, Man and Cybernetics: Systems

Neural networks that overlook the underlying causal relationships among observed variables pose significant risks in high-stake decision-making contexts due to the concerns about the robustness and stability of model performance. To tackle this issue, we present a general approach for embedding hierarchical causal structure among observed variables into neural network to inform its learning. The proposed methodology, termed causality-informed neural network (CINN), exploits hierarchical causal structure learned from observational data as a structurally informed prior to guide the layer-to-layer architectural design of the neural network while maintaining the orientation of causal relationships in the discovered causal graph. The proposed method involves three steps. First, CINN mines causal relationships from observational data via directed acyclic graph (DAG) learning, where causal discovery is recast as a continuous optimization problem to circumvent the combinatorial nature of DAG learning. Second, we encode the discovered hierarchical causal graph among observed variables into neural network via a dedicated architecture and loss function. By classifying observed variables in the DAG as root, intermediate, and leaf nodes, we translate the hierarchical causal DAG into CINN by creating a one-to-one correspondence between DAG nodes and certain CINN neurons. For the loss function, both intermediate and leaf nodes in the DAG are treated as target outputs during CINN training, facilitating the co-learning of causal relationships among the observed variables. Finally, as multiple loss components emerge in CINN, we leverage the projection of conflicting gradients to mitigate gradient interference among the multiple learning tasks. Computational studies indicate that CINN outperforms several state-of-the-art methods across a broad range of datasets. In addition, an ablation study that incrementally incorporates structural and quantitative causal knowledge into the neural network is conducted to highlight the pivotal role of causal knowledge in enhancing neural network’s prediction performance.

Prof. Zhisheng Ye delivered a talk on “Optimal Abort Policy for Mission-Critical Systems under Imperfect Condition Monitoring”

While most on-demand mission-critical systems are engineered to be reliable to support critical tasks, occasional failures may still occur during missions. To increase system survivability, a common practice is to abort the mission before an imminent failure. We consider optimal mission abort for a system whose deterioration follows a general three-state (normal, defective, failed) semi-Markov chain. The failure is assumed self-revealed, while the healthy and defective states have to be predicted from imperfect condition monitoring data. Due to the non-Markovian process dynamics, optimal mission abort for this partially observable system is an intractable stopping problem. For a tractable solution, we introduce a novel tool of Erlang mixtures to approximate non-exponential sojourn times in the semi-Markov chain. This allows us to approximate the original process by a surrogate continuous-time Markov chain whose optimal control policy can be solved through a partially observable Markov decision process (POMDP). We show that the POMDP optimal policies converge almost surely to the optimal abort decision rules when the Erlang rate parameter diverges. This implies that the expected cost by adopting the POMDP solution converges to the optimal expected cost. Next, we provide comprehensive structural results on the optimal policy of the surrogate POMDP. Based on the results, we develop a modified point-based value iteration algorithm to numerically solve the surrogate POMDP. We further consider mission abort in a multi-task setting where a system executes several tasks consecutively before a thorough inspection. Through a case study on an unmanned aerial vehicle, we demonstrate the capability of real-time implementation of our model, even when the condition-monitoring signals are generated with high frequency.