Research paper accepted by IEEE Transactions on Systems, Man and Cybernetics: Systems

Neural networks that overlook the underlying causal relationships among observed variables pose significant risks in high-stake decision-making contexts due to the concerns about the robustness and stability of model performance. To tackle this issue, we present a general approach for embedding hierarchical causal structure among observed variables into neural network to inform its learning. The proposed methodology, termed causality-informed neural network (CINN), exploits hierarchical causal structure learned from observational data as a structurally informed prior to guide the layer-to-layer architectural design of the neural network while maintaining the orientation of causal relationships in the discovered causal graph. The proposed method involves three steps. First, CINN mines causal relationships from observational data via directed acyclic graph (DAG) learning, where causal discovery is recast as a continuous optimization problem to circumvent the combinatorial nature of DAG learning. Second, we encode the discovered hierarchical causal graph among observed variables into neural network via a dedicated architecture and loss function. By classifying observed variables in the DAG as root, intermediate, and leaf nodes, we translate the hierarchical causal DAG into CINN by creating a one-to-one correspondence between DAG nodes and certain CINN neurons. For the loss function, both intermediate and leaf nodes in the DAG are treated as target outputs during CINN training, facilitating the co-learning of causal relationships among the observed variables. Finally, as multiple loss components emerge in CINN, we leverage the projection of conflicting gradients to mitigate gradient interference among the multiple learning tasks. Computational studies indicate that CINN outperforms several state-of-the-art methods across a broad range of datasets. In addition, an ablation study that incrementally incorporates structural and quantitative causal knowledge into the neural network is conducted to highlight the pivotal role of causal knowledge in enhancing neural network’s prediction performance.

Prof. Zhisheng Ye delivered a talk on “Optimal Abort Policy for Mission-Critical Systems under Imperfect Condition Monitoring”

While most on-demand mission-critical systems are engineered to be reliable to support critical tasks, occasional failures may still occur during missions. To increase system survivability, a common practice is to abort the mission before an imminent failure. We consider optimal mission abort for a system whose deterioration follows a general three-state (normal, defective, failed) semi-Markov chain. The failure is assumed self-revealed, while the healthy and defective states have to be predicted from imperfect condition monitoring data. Due to the non-Markovian process dynamics, optimal mission abort for this partially observable system is an intractable stopping problem. For a tractable solution, we introduce a novel tool of Erlang mixtures to approximate non-exponential sojourn times in the semi-Markov chain. This allows us to approximate the original process by a surrogate continuous-time Markov chain whose optimal control policy can be solved through a partially observable Markov decision process (POMDP). We show that the POMDP optimal policies converge almost surely to the optimal abort decision rules when the Erlang rate parameter diverges. This implies that the expected cost by adopting the POMDP solution converges to the optimal expected cost. Next, we provide comprehensive structural results on the optimal policy of the surrogate POMDP. Based on the results, we develop a modified point-based value iteration algorithm to numerically solve the surrogate POMDP. We further consider mission abort in a multi-task setting where a system executes several tasks consecutively before a thorough inspection. Through a case study on an unmanned aerial vehicle, we demonstrate the capability of real-time implementation of our model, even when the condition-monitoring signals are generated with high frequency.

Congratulations on Jingxiao LIAO to pass his PhD oral defense!!!

In recent years, deep learning has achieved significant success in various fields, including natural language processing, autonomous driving, and computer vision. In the realm of prognostics and health management (PHM) for rolling bearings in rotating machinery—such as aero engines, wind turbines, and high-speed trains—numerous intelligent PHM methodologies have emerged to provide accurate and adaptable machinery fault diagnostics and prognostics. However, methodologically speaking, there is no one-size-fits-all approach. It is widely acknowledged that these data-driven approaches still possess considerable limitations, hindering their widespread adoption in industrial settings.

Three primary challenges persist: (1) the lack of interpretability in deep learning methods, particularly in machinery fault diagnosis, where diagnostic models must be transparent to foster trust in the results and inform maintenance decisions; (2) the limited generalizability and reliability of bearing remaining useful life (RUL) prediction models. When training data is scarce, even under identical operating conditions and with the same bearing types, current RUL models demonstrate suboptimal accuracy. In addition, ensuring the reliability of RUL predictions is an important consideration for making informed maintenance decisions in real-world scenarios; and (3) the difficulty in deploying intelligent diagnosis models to edge devices, which hinders their integration into real-world industrial settings.

Therefore, this dissertation aims to address these challenges by constructing the paradigm of integrating traditional signal processing and modern deep learning methods. We formally define this approach as signal processing-empowered neural networks, which synthesize the complementary strengths of both domains. This framework provides three key advantages: (1) integrating rigorous signal processing theory to improve model interpretability; (2) leveraging the robust feature representation capabilities of signal processing techniques to enhance deep learning model generalizability and auxiliary exponential model to quantify the reliability of RUL predictions; and (3) enabling faster computation and greater accuracy, thereby facilitating the edge device deployment of lightweight models. The research contents are summarized as follows: