“What if I am not measuring all the potential root causes?” is a question we frequently encounter from industry experts. While it’s important to have comprehensive data for problem-solving, capturing every root cause is unfeasible. This article illustrates that robust algorithms for root cause analysis can uncover significant production issues even amidst other unexplained effects.
Why the real world differs from theory
Any root cause analysis starts with aggregating the right set of data. In an ideal world, we would measure all variables that influence an outcome variable of interest (e.g., quality). Such holistic data coverage would enable an AI-based analysis to identify all drivers of production issues. However, reality often presents challenges with some root causes not being measurable or reflected in the data. For instance, consider a missing temperature sensor in an injection molding machine or unmeasured sources of particles in a semiconductor fabrication process. In such scenarios, even the most advanced AI algorithms cannot directly reveal the unexplained root causes.
There is a common misconception that for AI-based root cause analysis to be effective, the data must be perfect. This is not the case. While it is true that unmeasured variables limit the ability to make process improvements, useful insights can still be gained from the data that most manufacturers collect today. The presence of unexplained variation does not preclude the value of such analyses. Imperfect models can still enhance process understanding. In this article, we will explore an example demonstrating how, despite the absence of some sensors, robust algorithms are capable of reliably identifying key root causes amidst unexplained variation.
Simulated production setup
We introduce a practical case for our root cause analysis by simulating data for five sensor measurements and a quality metric defined as yield. Our simulation aims to uncover root causes of yield losses using data from these sensor measurements across 10,000 production batches. The relationship between the sensor measurements and the yield is captured by the following formula:
Here’s a breakdown of the above formula:
- The ideal value of Sensor 1 measurement is 100. Deviations from this value reduce the yield.
- The ideal value of Sensor 2 measurement is 20. Deviations from this value reduce the yield.
- The ideal value of Sensor 3 measurement is 50. Deviations from this value reduce the yield.
- Sensor 4 measurement and Sensor 5 measurement have no impact on the yield.
The below figure displays the distributions of the five sensor measurements and the production yield. Our goal is to identify sensor measurements that cause yield variation by utilizing the EthonAI Analyst software. The EthonAI Analyst employs causal algorithms to pinpoint the root causes behind production issues. Importantly, we approach this analysis as if the above ground-truth function linking sensor measurements to yield is unknown.
In the following, we will systematically omit sensor measurements from our dataset to observe any changes in root cause analysis outcomes. This approach tests the robustness of our analysis, ensuring it can still accurately identify all measured effects. Despite the inability to account for unmeasured sensors, we demonstrate that even models with incomplete data can significantly improve our understanding of the process.
Numerical experiments for unmeasured root causes
In the first scenario, we will investigate the situation where all sensors are operational. Here, the EthonAI Analyst should be able to detect all root causes accurately as all information is contained in the dataset. Upon analyzing the data, the EthonAI Analyst presents a ranking of sensor measurements based on their impact. It can be observed that the first three sensor measurements are correctly identified as root causes, whereas Sensor 4 measurement and Sensor 5 measurement get no attribution in the analysis. Therefore, the attained root cause model is an accurate approximation of the ground truth relationships.
In the next scenario, we will remove Sensor 1 from our dataset without changing the rest of the data. Therefore, the effect of Sensor 1 measurement will result in unexplained variation in the yield. However, a good root cause analysis should still detect the measurements of Sensor 2 and Sensor 3 as root causes of yield losses. As can be seen in the below root cause ranking, the EthonAI Analyst still gives the same weight to Sensor 2 measurement and Sensor 3 measurement. Also the magnitude of the effect is close to the previous root cause model, where the entire variation in the yield could be explained.
In the final scenario, we will remove both Sensor 1 and Sensor 2 from our dataset without changing the rest of the data. Now two out of the three root causes cannot be explained, which results in a large portion of the variation to be unexplained. We analyze the data with the EthonAI Analyst and still get the expected results. In particular, Sensor 3 measurement is detected as a root cause and its magnitude is comparable to the one in the root cause model where the entire variation could be explained.
Conclusion
This article has demonstrated that comprehensive data collection is crucial for effective root cause analysis, but it’s not necessary to measure every variable to begin the process. Robust algorithms can uncover significant production issues even when faced with incomplete data. The real world often presents challenges where certain root causes remain unmeasurable or unaccounted for, such as missing sensors. Despite these limitations, AI-based analyses can still provide valuable insights, which enhances process understanding and facilitates KPI improvements.
Through numerical experiments, we have illustrated the effectiveness of the EthonAI Analyst software in identifying root causes, even when sensor data is systematically removed. Our simulations revealed that the EthonAI Analyst accurately identified key root causes in scenarios where all sensors were operational, as well as in scenarios where sensors were deliberately omitted. Importantly, the Analyst’s ability to maintain accurate root cause models, even with incomplete data, underscores its reliability in real-world production settings.
In our experience, a significant portion of problems can be addressed by the existing data manufacturers collect today. Initiating root cause analysis early not only aids in problem-solving but also guides decisions regarding sensor deployment. Often, unmeasured relationships can be approximated using proxies (e.g., machine IDs). For example, adding routing information (i.e., how individual units flow through production) can already point process experts to the sources of problems (e.g., towards suspicious machines). Our advice is clear: don’t wait for perfect data before embarking on data-driven analysis. Start with what you have, and progressively enhance data coverage and quality to drive continuous improvement in your manufacturing processes.