Research paper accepted by Nature Communications

Industrial enterprises are prominent sources of contaminant discharge on the planet and regulating their operations is vital for sustainable development. However, accurately tracking contaminant generation at the firm-level remains an intractable global issue due to significant heterogeneities among enormous enterprises and the absence of a universally applicable estimation method. This study addressed the challenge by focusing on hazardous waste (HW), known for its severe harmful properties and difficulty in automatic monitoring, and developed a data-driven methodology that predicted HW generation utilizing wastewater big data in a uniform and lightweight manner. The idea is grounded in the availability of wastewater big data with widespread application of automatic sensors, enabling depiction of heterogeneous enterprises, and the logical assumption that a correlation exists between wastewater and HW generation. We simulated this relationship by designing a generic framework that jointly used representative variables from diverse sectors, exploited a data-balance algorithm to address long-tail data distribution, and incorporated causal discovery to screen features and improve computation efficiency. To illustrate our approach, we applied it to 1024 enterprises across 10 sectors in Jiangsu, a highly industrialized province in China. Validation results demonstrated the model’s high fidelity (R2=0.87) in predicting HW generation using 4,260,593 daily wastewater data.

Research paper accepted by Risk Analysis

In this paper, we develop a generic framework for systemically encoding causal knowledge manifested in the form of hierarchical causality structure and qualitative (or quantitative) causal relationships into neural networks to facilitate sound risk analytics and decision support via causally-aware intervention reasoning. The proposed methodology for establishing causality-informed neural network (CINN) follows a four-step procedure. In the first step, we explicate how causal knowledge in the form of directed acyclic graph (DAG) can be discovered from observation data or elicited from domain experts. Next, we categorize nodes in the constructed DAG representing causal relationships among observed variables into several groups (e.g., root nodes, intermediate nodes, leaf nodes), and align the architecture of CINN with causal relationships specified in the DAG while preserving the orientation of each existing causal relationship. In addition to a dedicated architecture design, CINN also gets embodied in the design of loss function, where both intermediate and leaf nodes are treated as target outputs to be predicted by CINN. In the third step, we propose to incorporate domain knowledge on stable causal relationships into CINN, and the injected constraints on causal relationships act as guardrails to prevent unexpected behaviours of CINN. Finally, the trained CINN is exploited to perform intervention reasoning with emphasis on estimating the effect that policies and actions can have on the system behavior, thus facilitating risk-informed decision making through comprehensive “what-if” analysis. Two case studies are used to demonstrate the substantial benefits enabled by CINN in risk analytics and decision support.