Skip to content


What data is needed for AI-based root cause analysis?

by Dr. Peter Kaspar

In manufacturing, one concern unites everyone from line workers to top management: the quality of the goods produced. Continuous improvement of the baseline quality and fast reaction to quality issues are the keys to success. AI-based root cause analysis is the essential tool for effective quality management, and the data is the fuel. However, what kind of data is needed for effective root cause analysis in manufacturing? This article provides an overview.

Quality Metrics

A common hurdle in quality management, surprisingly, lies in establishing a robust quality metric. If we cannot accurately measure how good the quality is, we cannot monitor its stability nor can we judge whether our improvement actions are successful.

Although we learned in Six Sigma training how to qualify a measurement such that it meets the standards of ANOVA’s gauge R&R, it’s not guaranteed that we can set up such a measurement in practice. And even if we have a solid measurement set up, how many of us have ever worked with rock-solid pass/fail criteria? And how many of us have always resisted the temptation of “can’t we just move the spec limits a bit?” when we had to solve a quality issue? The first step towards data-driven root cause analysis should always be to make sure that we have a quality metric that we can trust and that has a fixed target value.

Process Data

The second challenge is a simple but sometimes overlooked fact: the best algorithm will fail to find a root cause if that root cause hasn’t left its traces in the data that we use for the analysis. Collecting a bunch of production data and throwing it into an AI tool can lead to interesting insights, but if the AI only finds meaningless relations, it can well be because there was nothing useful to be found in the data.

In that case it makes sense to take a step back and ask: what kind of issues have we solved in the past? Would these issues have been detectable with the available data? If not, can we add a sensor that records the missing information? Expert knowledge and domain knowledge can often be worked into the data collection by linking data from different sources. The more expert knowledge goes into data collection, the more straightforward it becomes to translate the results of an AI-driven root cause analysis into an improvement action.

Diagram showing where in a production flow the EthonAI Analyst collects input and output data

Linking of Data

Now that we have quality data and process data, they must be linked together. It is not enough to know that the temperature in equipment A was 45°C and the raw material was provided from Supplier B, we need to know which of the products that end up in the quality check were affected by these process conditions. Some manufacturers use unique batch IDs inscribed on their products, some use RFID tags to track them, but sometimes we simply have a linear flow of products without any identification. In this case, we can rely on timestamps and the knowledge of the time delay between process and quality check. There can be some uncertainties in this timestamp matching, but in most cases the AI algorithms are sufficiently robust to handle them.

Routing History

There are many production setups in which multiple machines can perform the same task and, depending on availability, one or the other equipment gets used for a given product. In this case, the routing information is highly valuable data for root cause analysis. Even if the equipment is too old to produce and transmit data about process conditions, the simple fact that the machine was used for many of the failed products can give a crucial hint to the process engineers who can then track down and fix the issue.

Process Sequence 

Lastly, sophisticated root cause analysis tools leverage information on how the products flow through the sequence of process steps to deduce causal relationships and map out chains of effects. Providing these tools with chronological process sequences can rule out irrelevant causal connections, enhancing both the speed and reliability of the analysis.


When embarking on the journey of AI-based root cause analysis in manufacturing, remember these key points: 

  • prioritize a robust quality metric, 
  • integrate expert knowledge in data collection, 
  • establish clear links between process and quality data, 
  • value routing information, 
  • and utilize chronological process information. 

By focusing on these areas, manufacturers can significantly enhance their quality management processes, leading to operational excellence and sustained success.

Dr. Peter Kaspar

Peter Kaspar is the Product Owner of the EthonAI Analyst software. He holds a PhD in physics from ETH Zurich. Peter has 10 years industry experience in semiconductor manufacturing where he held roles in R&D, product engineering, and data analytics.