Research paper accepted by IEEE Transactions on Systems, Man and Cybernetics: Systems

Neural networks that overlook the underlying causal relationships among observed variables pose significant risks in high-stake decision-making contexts due to the concerns about the robustness and stability of model performance. To tackle this issue, we present a general approach for embedding hierarchical causal structure among observed variables into neural network to inform its learning. The proposed methodology, termed causality-informed neural network (CINN), exploits hierarchical causal structure learned from observational data as a structurally informed prior to guide the layer-to-layer architectural design of the neural network while maintaining the orientation of causal relationships in the discovered causal graph. The proposed method involves three steps. First, CINN mines causal relationships from observational data via directed acyclic graph (DAG) learning, where causal discovery is recast as a continuous optimization problem to circumvent the combinatorial nature of DAG learning. Second, we encode the discovered hierarchical causal graph among observed variables into neural network via a dedicated architecture and loss function. By classifying observed variables in the DAG as root, intermediate, and leaf nodes, we translate the hierarchical causal DAG into CINN by creating a one-to-one correspondence between DAG nodes and certain CINN neurons. For the loss function, both intermediate and leaf nodes in the DAG are treated as target outputs during CINN training, facilitating the co-learning of causal relationships among the observed variables. Finally, as multiple loss components emerge in CINN, we leverage the projection of conflicting gradients to mitigate gradient interference among the multiple learning tasks. Computational studies indicate that CINN outperforms several state-of-the-art methods across a broad range of datasets. In addition, an ablation study that incrementally incorporates structural and quantitative causal knowledge into the neural network is conducted to highlight the pivotal role of causal knowledge in enhancing neural network’s prediction performance.

Prof. Zhisheng Ye delivered a talk on “Optimal Abort Policy for Mission-Critical Systems under Imperfect Condition Monitoring”

While most on-demand mission-critical systems are engineered to be reliable to support critical tasks, occasional failures may still occur during missions. To increase system survivability, a common practice is to abort the mission before an imminent failure. We consider optimal mission abort for a system whose deterioration follows a general three-state (normal, defective, failed) semi-Markov chain. The failure is assumed self-revealed, while the healthy and defective states have to be predicted from imperfect condition monitoring data. Due to the non-Markovian process dynamics, optimal mission abort for this partially observable system is an intractable stopping problem. For a tractable solution, we introduce a novel tool of Erlang mixtures to approximate non-exponential sojourn times in the semi-Markov chain. This allows us to approximate the original process by a surrogate continuous-time Markov chain whose optimal control policy can be solved through a partially observable Markov decision process (POMDP). We show that the POMDP optimal policies converge almost surely to the optimal abort decision rules when the Erlang rate parameter diverges. This implies that the expected cost by adopting the POMDP solution converges to the optimal expected cost. Next, we provide comprehensive structural results on the optimal policy of the surrogate POMDP. Based on the results, we develop a modified point-based value iteration algorithm to numerically solve the surrogate POMDP. We further consider mission abort in a multi-task setting where a system executes several tasks consecutively before a thorough inspection. Through a case study on an unmanned aerial vehicle, we demonstrate the capability of real-time implementation of our model, even when the condition-monitoring signals are generated with high frequency.

Congratulations on Jingxiao LIAO to pass his PhD oral defense!!!

In recent years, deep learning has achieved significant success in various fields, including natural language processing, autonomous driving, and computer vision. In the realm of prognostics and health management (PHM) for rolling bearings in rotating machinery—such as aero engines, wind turbines, and high-speed trains—numerous intelligent PHM methodologies have emerged to provide accurate and adaptable machinery fault diagnostics and prognostics. However, methodologically speaking, there is no one-size-fits-all approach. It is widely acknowledged that these data-driven approaches still possess considerable limitations, hindering their widespread adoption in industrial settings.

Three primary challenges persist: (1) the lack of interpretability in deep learning methods, particularly in machinery fault diagnosis, where diagnostic models must be transparent to foster trust in the results and inform maintenance decisions; (2) the limited generalizability and reliability of bearing remaining useful life (RUL) prediction models. When training data is scarce, even under identical operating conditions and with the same bearing types, current RUL models demonstrate suboptimal accuracy. In addition, ensuring the reliability of RUL predictions is an important consideration for making informed maintenance decisions in real-world scenarios; and (3) the difficulty in deploying intelligent diagnosis models to edge devices, which hinders their integration into real-world industrial settings.

Therefore, this dissertation aims to address these challenges by constructing the paradigm of integrating traditional signal processing and modern deep learning methods. We formally define this approach as signal processing-empowered neural networks, which synthesize the complementary strengths of both domains. This framework provides three key advantages: (1) integrating rigorous signal processing theory to improve model interpretability; (2) leveraging the robust feature representation capabilities of signal processing techniques to enhance deep learning model generalizability and auxiliary exponential model to quantify the reliability of RUL predictions; and (3) enabling faster computation and greater accuracy, thereby facilitating the edge device deployment of lightweight models. The research contents are summarized as follows:

Research paper accepted by IEEE Transactions on Emerging Topics in Computational Intelligence

Equipping deep learning models with a principled uncertainty quantification (UQ) has become essential for ensuring their reliable performance in the open world. To handle uncertainty arising from two prevalent sources – distribution shift and out-of-distribution (OOD) – in the open-world settings, this paper presents a unified uncertainty-informed approach for quantifying and managing the risks these factors pose to the dependable function of deep learning models. Toward this goal, we propose leveraging a principled UQ approach — Spectral-normalized Neural Gaussian Process (SNGP) — to quantify the epistemic uncertainty associated with model predictions. Unlike other UQ methods in the literature, SNGP is characterized by two unique properties: (1) applying spectral normalization to the weights of the neural network’s hidden layers to preserve the relative distances among data points during data transformations; (2) replacing the traditional output layer of neural networks with a Gaussian process to enable distance-aware uncertainty estimation. Based on SNGP’s uncertainty estimate, we apply Youden’s index to determine an optimal threshold for categorizing the uncertainty into distinct levels, thereby enabling decision-makers to make uncertainty-informed decisions. Two datasets of varying scale are used to demonstrate how the proposed method facilitates risk assessment and management of deep learning models in the open environment. Computational results reveal that the proposed method achieves prediction performance comparable to Monte Carlo dropout and deep ensemble methods. Importantly, the proposed approach outperforms the other two methods by providing a computationally efficient, consistent, and principled uncertainty estimation under no distribution shift, distribution shift, and OOD conditions.

Research project funded by the Natural Science Foundation of Guangdong Province-General Program

Uncertainty quantification and spatiotemporal causal discovery for reliable traffic prediction

Research paper accepted by Reliability Engineering and Systems Safety

Multi-state systems (MSS) are widely used for modeling the behavior of engineering applications, where the system and its components can have more than two distinct states. Physics-Informed Neural Networks (PINNs) offer a viable solution for characterizing the dynamic state evolution of MSS. However, existing methods predominantly rely on uniformly sampled collocation points across the problem domain when training PINNs. Although some residual-based active learning methods exist, they are inherently static and local, and often fail to capture a crucial aspect of PINN training: identification and accurate modeling of the “critical transition regions” within the problem domain. To address this fundamental challenge, we treat PINN as a dynamic system and introduce a novel active learning method grounded in chaos theory to identify regions within the problem domain that are highly sensitive to initial conditions. Specifically, our method quantifies the degree of chaos at candidate collocation points by introducing small perturbations and using PINN’s forward propagation to simulate the dynamic evolution of both the original and perturbed collocation points. Collocation points that exhibit pronounced chaotic behavior—- where evolutionary trajectories diverge rapidly following perturbation—are identified as the system’s most unstable and valuable regions for PINN training. By prioritizing these dynamically unstable points, our method directs PINN to focus its learning on accurately delineating the boundaries of state transitions, thereby significantly enhancing the accuracy of reliability analysis. Experimental results on multiple benchmark partial differential equations (PDEs) and several MSSs demonstrate that, compared to other PINN learning schemes, our method shows superior accuracy and computational efficiency in MSS reliability assessment.

Prof. Cheng-Lin Liu gave a talk on “Open-World Learning: Problems and Strategies”

Traditional methods of pattern classification and machine learning usually assume closed world: the input pattern falls within a fixed set of classes. However, in open world, the input pattern can be of either known or unknown classes, or be outlier. While in training, the data may emerge incrementally, and the new dataset contain samples or with known or unknown classes, either labeled or unlabeled, or be outlier. Such open-world learning scenario involves multiple challenges including out-of-distribution (OOD) detection, confidence estimation, unlabeled data exploitation, catastrophic forgetting and novel category discovery. The challenges are attacked by combining techniques such as generative modeling, regularization, knowledge distillation, and hybrid learning. This talk will outline the status of open-world pattern recognition, identify the main challenges of open-world learning and main strategies, and present some recent progress achieved in my group: open-set recognition, class-incremental learning, and generalized category discovery.

Welcome one new PhD student to join the group!

The group for risk, reliability, and resilience informatics of intelligent systems warmly welcomes Hang Ji to join the team to start his PhD study journey.

Research project funded by the Natural Science Foundation of Shenzhen-General Program

Uncertainty quantification and spatiotemporal causal discovery for reliable traffic prediction

Research paper accepted by INFORMS Journal on Computing

Accurate and reliable prediction has profound implications to a wide range of applications, such as hospital admissions, inventory control, route planning. In this study, we focus on an instance of spatio-temporal learning problem–traffic prediction–to demonstrate an advanced deep learning model developed for making accurate and reliable prediction. Despite the significant progress in traffic prediction, limited studies have incorporated both explicit (e.g., road network topology) and implicit (e.g., causality-related traffic phenomena and impact of exogenous factors) traffic patterns simultaneously to improve prediction performance. Meanwhile, the variability nature of traffic states necessitates quantifying the uncertainty of model predictions in a statistically principled way; however, extant studies offer no provable guarantee on the statistical validity of confidence intervals in reflecting its actual likelihood of containing the ground truth. In this paper, we propose an end-to-end traffic prediction framework that leverages three primary components to generate accurate and reliable traffic predictions: dynamic causal structure learning for discovering implicit traffic patterns from massive traffic data, causally-aware spatio-temporal multi-graph convolution network (CASTMGCN) for learning spatio-temporal dependencies, and conformal prediction for uncertainty quantification. In particular, CASTMGCN fuses several graphs that characterize different important aspects of traffic networks (including physical road structure, time-lagged causal effect, contemporaneous causal relationships) and an auxiliary graph that captures the effect of exogenous factors on the road network. On this basis, a conformal prediction approach tailored to spatio-temporal data is further developed for quantifying the uncertainty in node-wise traffic predictions over varying prediction horizons. Experimental results on two real-world traffic datasets of varying scale demonstrate that the proposed method outperforms several state-of-the-art models in prediction accuracy; moreover, it generates more efficient prediction regions than several other methods while strictly satisfying the statistical validity in coverage.