Skip to content

The effect of unmeasured root causes in problem-solving

“What if I am not measuring all the potential root causes?” is a question we frequently encounter from industry experts. While it’s important to have comprehensive data for problem-solving, capturing every root cause is unfeasible. This article illustrates that robust algorithms for root cause analysis can uncover significant production issues even amidst other unexplained effects.

Why the real world differs from theory

Any root cause analysis starts with aggregating the right set of data. In an ideal world, we would measure all variables that influence an outcome variable of interest (e.g., quality). Such holistic data coverage would enable an AI-based analysis to identify all drivers of production issues. However, reality often presents challenges with some root causes not being measurable or reflected in the data. For instance, consider a missing temperature sensor in an injection molding machine or unmeasured sources of particles in a semiconductor fabrication process. In such scenarios, even the most advanced AI algorithms cannot directly reveal the unexplained root causes.

There is a common misconception that for AI-based root cause analysis to be effective, the data must be perfect. This is not the case. While it is true that unmeasured variables limit the ability to make process improvements, useful insights can still be gained from the data that most manufacturers collect today. The presence of unexplained variation does not preclude the value of such analyses. Imperfect models can still enhance process understanding. In this article, we will explore an example demonstrating how, despite the absence of some sensors, robust algorithms are capable of reliably identifying key root causes amidst unexplained variation.

Simulated production setup

We introduce a practical case for our root cause analysis by simulating data for five sensor measurements and a quality metric defined as yield. Our simulation aims to uncover root causes of yield losses using data from these sensor measurements across 10,000 production batches. The relationship between the sensor measurements and the yield is captured by the following formula:

Here’s a breakdown of the above formula:

  • The ideal value of Sensor 1 measurement is 100. Deviations from this value reduce the yield.
  • The ideal value of Sensor 2 measurement is 20. Deviations from this value reduce the yield.
  • The ideal value of Sensor 3 measurement is 50. Deviations from this value reduce the yield.
  • Sensor 4 measurement and Sensor 5 measurement have no impact on the yield.

The below figure displays the distributions of the five sensor measurements and the production yield. Our goal is to identify sensor measurements that cause yield variation by utilizing the EthonAI Analyst software. The EthonAI Analyst employs causal algorithms to pinpoint the root causes behind production issues. Importantly, we approach this analysis as if the above ground-truth function linking sensor measurements to yield is unknown.

In the following, we will systematically omit sensor measurements from our dataset to observe any changes in root cause analysis outcomes. This approach tests the robustness of our analysis, ensuring it can still accurately identify all measured effects. Despite the inability to account for unmeasured sensors, we demonstrate that even models with incomplete data can significantly improve our understanding of the process.

Numerical experiments for unmeasured root causes

In the first scenario, we will investigate the situation where all sensors are operational. Here, the EthonAI Analyst should be able to detect all root causes accurately as all information is contained in the dataset. Upon analyzing the data, the EthonAI Analyst presents a ranking of sensor measurements based on their impact. It can be observed that the first three sensor measurements are correctly identified as root causes, whereas Sensor 4 measurement and Sensor 5 measurement get no attribution in the analysis. Therefore, the attained root cause model is an accurate approximation of the ground truth relationships.

In the next scenario, we will remove Sensor 1 from our dataset without changing the rest of the data. Therefore, the effect of Sensor 1 measurement will result in unexplained variation in the yield. However, a good root cause analysis should still detect the measurements of Sensor 2 and Sensor 3 as root causes of yield losses. As can be seen in the below root cause ranking, the EthonAI Analyst still gives the same weight to Sensor 2 measurement and Sensor 3 measurement. Also the magnitude of the effect is close to the previous root cause model, where the entire variation in the yield could be explained.

In the final scenario, we will remove both Sensor 1 and Sensor 2 from our dataset without changing the rest of the data. Now two out of the three root causes cannot be explained, which results in a large portion of the variation to be unexplained. We analyze the data with the EthonAI Analyst and still get the expected results. In particular, Sensor 3 measurement is detected as a root cause and its magnitude is comparable to the one in the root cause model where the entire variation could be explained.


This article has demonstrated that comprehensive data collection is crucial for effective root cause analysis, but it’s not necessary to measure every variable to begin the process. Robust algorithms can uncover significant production issues even when faced with incomplete data. The real world often presents challenges where certain root causes remain unmeasurable or unaccounted for, such as missing sensors. Despite these limitations, AI-based analyses can still provide valuable insights, which enhances process understanding and facilitates KPI improvements.

Through numerical experiments, we have illustrated the effectiveness of the EthonAI Analyst software in identifying root causes, even when sensor data is systematically removed. Our simulations revealed that the EthonAI Analyst accurately identified key root causes in scenarios where all sensors were operational, as well as in scenarios where sensors were deliberately omitted. Importantly, the Analyst’s ability to maintain accurate root cause models, even with incomplete data, underscores its reliability in real-world production settings.

In our experience, a significant portion of problems can be addressed by the existing data manufacturers collect today. Initiating root cause analysis early not only aids in problem-solving but also guides decisions regarding sensor deployment. Often, unmeasured relationships can be approximated using proxies (e.g., machine IDs). For example, adding routing information (i.e., how individual units flow through production) can already point process experts to the sources of problems (e.g., towards suspicious machines). Our advice is clear: don’t wait for perfect data before embarking on data-driven analysis. Start with what you have, and progressively enhance data coverage and quality to drive continuous improvement in your manufacturing processes.

Deploying a Manufacturing Analytics System: On-premises vs. cloud-based solutions

A Manufacturing Analytics System (MAS) integrates across data sources and provides valuable insights into production processes. As companies evaluate their options, a key decision emerges: should they deploy the MAS onto their own premises, or opt for a cloud-based Software as a Service (SaaS) solution?

This article discusses the merits of each approach to help businesses make an informed decision. It focuses on five major discussion points: data security, scalability, maintenance, cost effectiveness, and support.

Data Security and Compliance

On-Premises: Tailored to Specific Needs

The primary advantage of on-premises deployments lies in the enhanced control and security it offers. Companies with highly sensitive data often prefer on-premise solutions due to their stringent security requirements. It can be easier to conform to stringent or inflexible policies by hosting the MAS internally. This setup allows for a more hands-on approach to data management, ensuring compliance with standards like GDPR, HIPAA, NIST, or other industry-specific regulations.

Cloud-Based Solutions: Robust, Standardized Security

Cloud-based MAS solutions have often been perceived as less secure, and some companies generally distrust the cloud. However, especially in recent years, cloud offerings have evolved significantly. Reputable cloud providers employ robust security measures, including advanced encryption, regular security audits, and compliance with various international standards. They have the resources and expertise to implement and maintain higher levels of security than individual organizations can achieve on their own. For businesses without the capacity or desire to manage complex security infrastructure, a cloud-based MAS offers a secure, compliant, and hassle-free alternative.

Scalability on Demand

On-Premises: Tailored to Specific Needs

An on-premises MAS deployment allows for extensive customization. Businesses can tailor the system to their specific IT and OT landscape, including guaranteed real-time responses. This capability is particularly beneficial for companies requiring deep integration with legacy systems and factory equipment. On the other hand, scaling on-premises solutions typically requires significant investment in hardware and infrastructure, as well as the technical expertise to manage these expansions.

Cloud-Based Solutions: Easy Scalability and Flexibility

Cloud-based MAS platforms shine in scalability. They allow businesses to scale their operations up or down with ease, without the need to invest in physical infrastructure. This scalability makes cloud solutions ideal for businesses experiencing rapid growth or fluctuating demands. Furthermore, cloud platforms are continually updated with the latest features and capabilities, ensuring businesses always have access to the most advanced tools without additional investment or effort in upgrading systems. A potential down-side is that ultimate control of the deployment lies with the cloud provider, which can be a hurdle for highly regulated industries.

Maintenance and Updates

On-Premises: Hands-On, Resource-Intensive Maintenance

Maintaining an on-premises MAS can require a dedicated IT personnel to manage hardware, perform regular software updates, and troubleshoot issues. This hands-on approach offers complete control over the maintenance schedule and system changes, but can be resource-intensive. Companies who already have specialized IT teams due to the nature of their operations may find this approach a natural fit.

Cloud-Based Solutions: Hassle-Free, Automatic Updates

Cloud-based solutions significantly reduce the burden of maintenance. The service provider typically manages all aspects of system maintenance, including regular updates, security patches, and technical support. Automatic updates ensure that the system is always running the latest software version, providing access to new features and improvements without additional effort or cost. This allows businesses to focus on their core operations, without the need to allocate and manage resources for system maintenance.

Cost Effectiveness

On-Premises: Higher Initial Investment but Predictable Long-Term Costs

Deploying any system on-premises typically involves a higher initial capital expenditure, including costs for hardware, software licensing, and installation. Over the long term, these costs can be more predictable, or at least there are no cloud-subscription fees to factor in. For organizations with the necessary infrastructure already in place, this model can be cost-effective, particularly when considering the longevity and stability of the investment.

Cloud-Based Solutions: Lower Upfront Costs with Ongoing Expenses

Cloud-based MAS solutions offer lower initial costs and much quicker setup compared to on-premise installations. Businesses can avoid significant expenses on hardware and infrastructure. This subscription model converts upfront investments into ongoing operational expenses. In addition to the ease of setup, this can be more cost-effective in the short term. However, for businesses with long-term predictable usage patterns, it is important to consider the cumulative costs over an extended period.


On-Premises: Customized and Direct Control

This model of deployment demands a significant commitment of internal resources for maintenance and troubleshooting, necessitating dedicated, skilled IT personnel. While on-prem provides an unmatched level of control and customization, as discussed earlier in this post, the reliance on in-house capabilities for supporting the MAS can be a considerable burden on manufacturing customers.

Cloud-Based Solutions: Broad, Expert Support with 24/7 Availability

Cloud-based MAS solutions boast a scalable, expert support structure, alleviating the need for an in-house IT team to manage the MAS deployment. This is particularly important for operations spread across multiple locations or time zones. Automatic updates and maintenance conducted by the provider ensure the system remains up-to-date without any additional effort from the customer side. Furthermore, troubleshooting is accelerated in a cloud-based system because the infrastructure is standardized and uniform. This consistency reduces complexity and variability, which significantly improves the efficiency and speed of support services.


The choice between deploying a MAS on-premises or in the cloud depends on various factors including data security needs, customization requirements, budget constraints, network reliability, and maintenance capabilities. Each option has its merits, and the decision should align with the specific operational, financial, and strategic objectives of the organization. At EthonAI, we offer both options to meet our customers’ needs effectively.

A story of why causal AI is necessary for root cause analysis in manufacturing

Traditional machine learning is designed for prediction and often struggles with root cause analysis. The article presents a short story demonstrating how causal AI overcomes this problem.

Why causality is needed for decision-making

Data-driven decisions are paramount to stay competitive in today’s manufacturing. However, for effective decisions, we need tools that transform data into actionable insights. Traditional machine learning tools, while great for predictions, fall short in decision-making due to their inability to grasp cause and effect relationships. They fail to understand how different decisions impact outcomes. To make truly informed decisions, understanding these cause and effect dynamics is crucial.

Causal AI provides manufacturers with entirely new insights by going beyond the prediction-focused scope of traditional machine learning. It seeks to uncover the causes behind outcomes, which enables us to assess and compare the outcomes of different decisions. This offers crucial insights for more informed root cause analysis. For manufacturers, this means not only predicting what will happen, but which decision can be taken now that leads to a better outcome in the future.

What is causal AI?

Causal AI, at its core, is an advanced form of artificial intelligence that seeks to understand and quantify cause-and-effect relationships in data. In particular, causal AI aims to understand how one variable A influences another variable B. This is important for decision-making, since if we want to change A with the goal to increase B, we need to know how A influences B. Traditional machine learning only uses A to predict B, but cannot answer what happens to B if we change A as we will see in an example below. However, the answer to this question is important for decision-making, in particular in the context of root cause analysis in manufacturing.

This article looks into the task of root cause analysis for quality improvement. The focus is to maximize “good” quality and minimize “bad” quality outcomes. Simply predicting when quality will drop is not enough in this setting. The objective is to identify and adjust specific production parameters (like adjusting a machine setpoint) when bad quality is observed, to restore good quality. Therefore, understanding the cause-and-effect relationships between these production parameters and the product quality is key. This knowledge allows us to pinpoint which parameters are causing quality issues and make necessary changes to achieve desired quality levels consistently. In the following, we tell a short story to demonstrate the capabilities of causal AI in this context.

Causal AI for root cause analysis

Let’s imagine a manufacturing company specializing in plastic Christmas trees, a seasonal product where quality and timeliness are key. The company faced a peculiar challenge: a noticeable drop in the quality of their plastic trees. Naturally, they turned to data for answers.

Their initial investigation was led by a skilled data scientist, who collected data about the production process. The production process consists of two steps: First, the plastic branches are sourced from a supplier. Second, the branches are put through a machine which attaches the branches to the trunk. There are two possible suppliers, A and B, and two possible machines, M1 and M2.

The data scientist used traditional machine learning techniques, which focused on predicting the quality based on the collected data. This led to an intriguing conclusion: The machine learning model suggested that machine M1 produced worse quality than M2. Based on this analysis, the data scientist recommended stop using the machine M1, which would lead to a substantial reduction in throughput and, hence, reduced production capacity. However, the story took a twist when the company decided to scrutinize both machines. To their astonishment, there was no recognizable difference in the settings of the machines or the machines themselves. This puzzling situation called for a deeper analysis, beyond what traditional machine learning could offer.

Luckily, a friend of the company’s data scientist is a renowned causal AI expert. The expert developed a tailored causal AI algorithm for the production process, seeking not just good predictions, but to understand the underlying cause-and-effect relationships in the production process. The causal AI model revealed an unexpected insight: the root cause of the quality drop was not the machine, but the supplier. In fact, it revealed that Supplier A delivered branches of worse quality than Supplier B. After talking to the factory workers, the company found out that the workers always put the branches of Supplier A through machine M1 and the branches of Supplier B through machine M2. They did this simply because the machines were closer to the boxes with the corresponding branches. Hence, all the low-quality branches of Supplier A ran through machine M2, which made machine M2 look like it is causing the drop in quality. 

But why did the traditional machine learning model fail to identify the true root cause? The reason is that its objective is prediction and, for this, knowing which machine the branches went through was enough to predict the quality perfectly. In particular, since the traditional machine learning model didn’t understand the underlying cause-and-effect relationships, it simply used all available parameters. However, by doing so, it also used the machine as a parameter, which, in this example, is a so-called mediator. By using this mediator, it “blocked” any indirect influence from the supplier via the machines. As a result, the influence of the supplier got lost. Since the causal AI understood the underlying cause-and-effect relationships, in particular the relationship between supplier and machine, it could correctly identify the true root cause.

Armed with this causal insight, the company informed Supplier A about the quality of their branches, which they ultimately were able to improve with new specifications. As such, leveraging causal AI averted a prolonged production stop of machine M1, which would have cost the company a lot of money. All of this just because the traditional machine learning model focuses on prediction, but not on understanding the underlying cause-and-effect relationships. Only a causal AI model could identify and rectify the true root cause of the quality issue.

In this simplified scenario, it would be easy to carefully check all parameters and production steps manually. But imagine a real-world scenario, in which we have hundreds or even thousands of parameters across many process steps. In such a setting, the clear association between machine M1 and quality, identified by traditional methods, can easily be mistaken for a root cause. And manually checking for other influence factors would be tedious, if not impossible. In this case, causal AI can identify the root cause immediately and, as such, saves a lot of time and costs.

Opportunities and challenges of causal AI in manufacturing

The opportunity of causal AI is clear: it offers new ways for manufacturing to identify the true root causes of problems. This depth of insight empowers manufacturers to make decisions that address core issues, leading to enhanced efficiency, quality, and competitive advantage. 

However, the adoption of causal AI is challenging. One significant hurdle is the absence of off-the-shelf software, which can be used without being a data scientist. Moreover, as the above example showed, even seasoned data scientists often lack the experience with causal AI. This is mainly because causal AI is a relatively new field. Despite these challenges, the potential gains in operational understanding and performance are substantial. 

If you’re interested in finding out how causal AI can help your problem-solving efforts, we invite you to book a demo and experience the impact firsthand.

Industrial anomaly detection: Using only defect-free images to train your inspection model

This article explains why it is important to use an inspection approach that does not require images of defective products in its training set, and what kind of algorithm is suited in practice.

Requirements for visual quality inspection

Industrial anomaly detection in quality inspection tasks aims to use algorithms that automatically detect defective products. It helps manufacturers achieve high quality standards and reduce rework.

This article focuses on industrial anomaly detection with image data. Modern machine learning algorithms can process this data to decide whether a product in an image is defective. To train such algorithms, a dataset of examplary images is needed.

An important feasibility criterion for manufacturers is the way these training datasets need to be compiled. For instance, some naive algorithms require large datasets to work reliably (around one thousand images or more for each product variant). This is expensive and often infeasible in practice. That’s why we only consider so-called few-shot algorithms that work reliably with a low number of examples, specifically much less than one hundred images. 

Another aspect that distinguishes algorithms is whether examples of defective products are needed. Here, we can broadly distinguish two classes of algorithms: (1) “generative algorithms” that can learn from just  normal (or defect-free) products, and (2) “discriminative algorithms” that require normal and anomalous (or defective) images.

This is an important distinction for two reasons. First, anomalies are often rare, and when the manufacturing of new product variants starts up, no defective data is available for training. Secondly, by definition “anomalous” is everything that is not normal, which makes it practically impossible to cover all possible anomalies with sufficient training data. The latter is the more important argument, so let’s look at it in more detail.

Figure 1: Example from PCB manufacturing. The green connectors on the bottom of the PCB need to be mounted correctly as shown in (a). Possible defects are misplaced, missing or incorrect connectors. Examples of missing connectors are shown in (b), (c), and (d).

Figure 1 illustrates this. The single example in (a) should already give you a good impression of the concept of “normal.” By contrast, the training images in (b) and (c) are by no means sufficient to define the concept for “anomalous” (e.g., other defect types such as discolorizations or misplacements are not represented).

Choosing the right type of algorithm

To better understand how discriminative and generative models differ when applied to anomaly detection, we use the PCB example in Figure 1 to construct a hypothetical scenario. For the sake of simplicity, a discriminative algorithm can be thought of as a decision boundary in a high-dimensional feature space. Each image becomes a point in that space, and lies either on the “normal” or the “anomalous” side of the boundary. Figure 2 simplifies this even further, down to a two-dimensional feature space. Such algorithms look at the training data of the two classes (normal and anomalous) and try to extract discriminating features when constructing the decision boundary. As such, these algorithms are likely not robust on unseen and novel defect types.

Figure 2: The two subfigures show a simplified two-dimensional feature space of a discriminative model. The dashed line is the decision boundary of the model after training. The dots correspond to training and test images, where green means defect-free and red means defective. Dots with a black border were in the training set, the others were not. The letters refer to the images in Figure 1. (a) Only contains images that were used to construct the discriminative model (training images). (b) Contains both training and test images, highlighting the difficulties of a discriminative model to generalize to all types of anomalous images.

To see how a discriminative algorithm fails in practice, recall that anomalous is everything that is not normal, and consider that normal images tend to occupy only a small volume in the wide feature space. By contrast, the surrounding space of anomalous images is vast. It is thus very unlikely to gather sufficiently numerous and different examples of such images for training.

In the example of Figure 2, the training images (with a black outline) happen to cover just the lower part of the space, and the resulting decision boundary is good at making that distinction. But it does not encode the fact that defective products can also lie further above, to the right, or to the left – which is where the unseen example 1(d) happens to lie.

Figure 2 illustrates the problem with discriminative models, when defect types are not part of the training set. The decision boundary may end up working well on the training data, but previously unseen defects can easily end up on the “normal” side of the boundary. Concretely in this example, the image 1(d) happens to be closer in feature space to the non-defective images than the defective images 1(b) and 1(c).

For this reason, we strongly advocate to use algorithms that focus on learning the concept of normality instead, and can thus be trained solely from normal images. Such algorithms can also benefit from defective images in their training set, in order to improve robustness to specific types of defects, but crucially, they do not require them. Using ML terminology, we seek industrial anomaly detection algorithms that explain how normal data is generated, as opposed to discriminating normal from anomalous images. Such models can represent the generative process behind normal data. This can be used to judge whether or not an image could have been created via this generative process. If not, then the image is anomalous.


The Inspector offered by EthonAI provides a state-of-the-art solution for manufacturers to the problem of visual inspection. The EthonAI Inspector performs anomaly detection with generative algorithms that can be trained with just a few defect-free images. This is a great advantage in manufacturing environments, where gathering images is expensive, especially if examples of defects need to be in the training data. In addition, the nature of the algorithms that we deploy are robust towards unseen defects, as outlined above. We constantly observe that customers can uncover new defect types in their manufacturing process that they were unaware of before. This significantly improves the quality assurance process as a whole.

Generative modeling (or generative AI) has seen tremendous successes in the past years. It is expected that the usage of such models will continue to grow in manufacturing and help set new quality standards. Most real-world scenarios require knowledge on how normal images are generated, including factors of allowed variations such as lighting and position. EthonAI will continue to push the limits of such algorithms, and help you ensure that you don’t ship defective products to your customers.

Virtual Design of Experiment: How to optimize your production processes through digital tools

The article compares traditional and virtual Design of Experiments in manufacturing. It emphasizes the efficiency of virtual Design of Experiments in optimizing production processes using causal AI, while highlighting the challenges in data collection and advanced statistical software required for their effective implementation.

What is this article about?

Production processes are getting increasingly complex and, as such, it is difficult to make sure that they run optimally. To understand and optimize their production processes, manufacturers often turn to so-called “Design of Experiments” (DOEs), where they systematically test different parameter settings against each other. While DOEs can provide valuable insights to improve production processes, they are also time consuming and costly. Hence, DOEs are typically conducted infrequently and focus only on a subset of the production parameters. Such incomplete optimizations often lead to suboptimal settings.

In this article, we explore a possible solution: Virtual Design of Experiments. Virtual DOEs overcome the major drawbacks of DOEs by building a digital twin of the production process. As such, virtual DOEs allow for a comprehensive optimization in a cheaper and faster way compared to traditional DOEs. This allows them to be run more frequently and results in fully optimized production processes. In the following, we will discuss what DOEs are, how virtual DOEs work, and the challenges related to virtual DOEs.

What is a DOE?

DOE is a statistical method designed to experimentally assess how specific parameters influence outcomes in manufacturing processes. Its origins date back to Ronald Fisher in the 1920s, initially for agricultural applications. In the 1980s, Genichi Taguchi’s methods notably advanced its use in manufacturing. However, DOE’s full potential remains underexploited in the industry, especially outside of sectors like pharmaceuticals and semiconductors. This is often because of the significant time and effort required to master DOEs, which combine statistical know-how with domain-specific knowledge (for an in-depth overview of DOEs, we refer interested readers to Michel Baudin’s blog).

Let’s consider an example, where we investigate the influence of oven temperature on the final quality in a cake factory. The goal is to find the temperature, which results in the best cake quality. To this end, we could use a DOE to investigate the influence of the oven temperature on the cake quality. The core principle is to keep all parameters in the production process constant, while only changing the temperature and observe the quality outcome. For instance, we could bake cakes at two different temperature settings; that is, Setting A = 170°C and Setting B = 180°C using the exact same ingredients. Then, we compare the resulting cake quality for both temperature settings. If the quality increases when we change the temperature from 170°C to 180°C, we have found a strong indication of a better temperature setting and keep it for further experimentation. 

Although the results of our DOE suggest that 180°C is superior to 170°C, it could also be that 190°C is even better than 180°C. Hence, in order to find the best setting for the temperature, we have to iteratively run multiple DOEs. Once we have run multiple DOEs and optimized one parameter, we may want to continue optimizing other parameters as well (e.g., the ingredients). As such, we will have to run multiple DOEs for different parameter combinations, which quickly becomes time consuming and costly. Not only because running a DOE requires planning, but also because the new setting may actually be worse than the old one, which leads to more quality losses.

A potential solution are virtual DOEs, which leverage data collected throughout the production process to “virtually” run DOEs. This not only costs less money, but is also typically faster.

What is a Virtual DOE?

Virtual DOEs have only recently become feasible because of the vast amount of data collected in production processes and advances in statistical methods. Virtual DOEs have the same goal as traditional DOEs: understand how production parameters influence the quality and find the optimal setting for those parameters. Different from traditional DOEs, virtual DOEs do not take place in the actual production process, but virtually in a software. Therefore, there is no need to change the actual production process as the insights are gained by simulating the changes of parameters virtually. This scales better to a large number of parameters and, most importantly, doesn’t require changing the actual production process, which circumvents the risk of reducing the products’ quality.

In a nutshell, when conducting virtual DOEs, we take all available production data to build a digital twin of the production process. Such a virtualization allows us to run virtual experiments with different simulated temperature settings and optimize the parameter for the best quality results.

Challenges of Virtual DOEs

In order to conduct virtual DOEs, there are two major challenges: (1) collecting the right data of the production process and (2) using the right statistical software to build the environment that allows us to virtually emulate the production process. 

While manufacturers already record large amounts of data, it may not always be the data needed for virtual DOEs. In order to run virtual DOEs, we need sufficient variation in production parameters to build a digital twin of a production process. Hence, when collecting data, one should consult process engineers to identify relevant data. This makes setting up virtual DOEs an interdisciplinary initiative, which requires IT experts to work closely with process engineers.

Moreover, the software needed to build virtual DOEs is statistically complex, because it has to make use of causal simulation. Causal simulation requires quantifying the impact of specific changes in production parameters on the final quality output, while controlling for a myriad of confounding variables that could skew results. Furthermore, it must be capable of handling large datasets with varying degrees of variability and correlation to ensure that the virtual experiments closely mimic real-world scenarios. Only recently, statistical methods with these capabilities have transitioned from research to practical applications.


DOE is an important tool for manufacturing companies to understand and optimize their production processes. However, they are time consuming and costly as they have to be conducted in the actual production process. Virtual DOEs are DOEs that run virtually in a software and do not interfere with the actual production process. They can save a lot of time and money, but are non-trivial as they rely on the data collected from the production process and advanced statistical software. Hence, enabling manufacturing companies to run virtual DOEs requires the right choice of data and software.

What data is needed for AI-based root cause analysis?

In manufacturing, one concern unites everyone from line workers to top management: the quality of the goods produced. Continuous improvement of the baseline quality and fast reaction to quality issues are the keys to success. AI-based root cause analysis is the essential tool for effective quality management, and the data is the fuel. However, what kind of data is needed for effective root cause analysis in manufacturing? This article provides an overview.

Quality Metrics

A common hurdle in quality management, surprisingly, lies in establishing a robust quality metric. If we cannot accurately measure how good the quality is, we cannot monitor its stability nor can we judge whether our improvement actions are successful.

Although we learned in Six Sigma training how to qualify a measurement such that it meets the standards of ANOVA’s gauge R&R, it’s not guaranteed that we can set up such a measurement in practice. And even if we have a solid measurement set up, how many of us have ever worked with rock-solid pass/fail criteria? And how many of us have always resisted the temptation of “can’t we just move the spec limits a bit?” when we had to solve a quality issue? The first step towards data-driven root cause analysis should always be to make sure that we have a quality metric that we can trust and that has a fixed target value.

Process Data

The second challenge is a simple but sometimes overlooked fact: the best algorithm will fail to find a root cause if that root cause hasn’t left its traces in the data that we use for the analysis. Collecting a bunch of production data and throwing it into an AI tool can lead to interesting insights, but if the AI only finds meaningless relations, it can well be because there was nothing useful to be found in the data.

In that case it makes sense to take a step back and ask: what kind of issues have we solved in the past? Would these issues have been detectable with the available data? If not, can we add a sensor that records the missing information? Expert knowledge and domain knowledge can often be worked into the data collection by linking data from different sources. The more expert knowledge goes into data collection, the more straightforward it becomes to translate the results of an AI-driven root cause analysis into an improvement action.

Diagram showing where in a production flow the EthonAI Analyst collects input and output data

Linking of Data

Now that we have quality data and process data, they must be linked together. It is not enough to know that the temperature in equipment A was 45°C and the raw material was provided from Supplier B, we need to know which of the products that end up in the quality check were affected by these process conditions. Some manufacturers use unique batch IDs inscribed on their products, some use RFID tags to track them, but sometimes we simply have a linear flow of products without any identification. In this case, we can rely on timestamps and the knowledge of the time delay between process and quality check. There can be some uncertainties in this timestamp matching, but in most cases the AI algorithms are sufficiently robust to handle them.

Routing History

There are many production setups in which multiple machines can perform the same task and, depending on availability, one or the other equipment gets used for a given product. In this case, the routing information is highly valuable data for root cause analysis. Even if the equipment is too old to produce and transmit data about process conditions, the simple fact that the machine was used for many of the failed products can give a crucial hint to the process engineers who can then track down and fix the issue.

Process Sequence 

Lastly, sophisticated root cause analysis tools leverage information on how the products flow through the sequence of process steps to deduce causal relationships and map out chains of effects. Providing these tools with chronological process sequences can rule out irrelevant causal connections, enhancing both the speed and reliability of the analysis.


When embarking on the journey of AI-based root cause analysis in manufacturing, remember these key points: 

  • prioritize a robust quality metric, 
  • integrate expert knowledge in data collection, 
  • establish clear links between process and quality data, 
  • value routing information, 
  • and utilize chronological process information. 

By focusing on these areas, manufacturers can significantly enhance their quality management processes, leading to operational excellence and sustained success.

Introducing the Manufacturing Analytics System

Modern factories generate substantial amounts of data, but frequently, it is not effectively utilized. This article explores how a new software category — the Manufacturing Analytics System — helps manufacturers turn their data into valuable insights for productivity improvement.

The need for a new software category in the manufacturing industry

The manufacturing industry is currently undergoing a significant transformation. Growing process complexity, changing macroeconomic trends, and rapid technological advancement make it increasingly urgent and difficult to achieve operational excellence.

In response to these changes, manufacturers need to find new ways to improve their productivity. Recent technological advances are a great opportunity to support manufacturers in this endeavor. Over the past years, there has been considerable progress in manufacturing data collection, which has been spurred by the development of IIoT sensor technologies, standardized network protocols, and cloud-based storage and computation. However, the mere collection of data does not automatically lead to increased productivity in factories. Often, this data is not effectively utilized to drive improvements.

Despite significant digitization efforts, the effective use of analytical tools in today’s factories remains limited. For many manufacturers, the situation can be best described as being data-rich but information-poor. According to IBM, approximately 90% of all sensor data is never analyzed. This is disappointing, because the lack of operational insights is currently considered as a key obstacle that manufacturers need to overcome.

What has led us to this situation? In essence, 21st century factories are still managed with 20th century tools. Data is stored in disjoint sources, software is fragmented and verticalized, there is no standardization across sites and teams. Consequently, key employees such as operations managers, process experts, and data scientists are bogged down by the extensive efforts required in aggregating and cleaning data. It is evident that we need unification of analytical standards and workflows to leverage existing data much more effectively.

The answer to these challenges is a new category of software: The Manufacturing Analytics System (MAS). A MAS creates a common context across disparate data sources, analyzes data with the latest AI techniques, and makes the results accessible in a suite of interoperating applications. The applications in a MAS are tailored for the different people involved in achieving operational excellence. A MAS serves as an intermediary between data and users, provides deeper insights faster, enables new types of automation, and streamlines the looping of decisions back to the factories. It makes employees considerably more effective and improves operational KPIs sustainably.

Under the hood of the Manufacturing Analytics System

Manufacturing Analytics Systems are designed to serve end-users across all levels of operational excellence, ranging from factory floor personnel to upper management. A MAS is structured around three layers that transform diverse manufacturing data into valuable insights for productivity improvement:

  • an Application Layer,
  • a Model Layer,
  • and a Context Layer.

The Application Layer contains user-facing software tools that enable productivity improvement. These tools are built on top of a Model Layer, which houses specialized AI models that are designed to generate insights or decisions from manufacturing data. The Context Layer is responsible for gathering data from a wide range of origins and channeling it into the Model Layer for processing. Each of the layers is described in more detail below.

Context Layer

The Context Layer serves as the foundation of a MAS. It prepares and organizes the data for its use in the Model Layer. This layer does not duplicate existing databases. Instead, it stores only relevant data, merged from different sources, in a unified and aggregated format. It provides the crucial link across disparate data sources. This is achieved by mapping or creating common identifiers like timestamps, part IDs, and batch IDs. The context and connections in this layer enable comprehensive analysis across different datasets.

Model Layer

The Model Layer comprises advanced AI models for complex data analysis in manufacturing. Unlike common data platforms that use off-the-shelf algorithms (e.g., Random Forests, GBM, ResNets, etc.), this layer involves tailored models specifically designed for manufacturing tasks (e.g., root cause analysis, visual inspection, material flow analysis, etc.). Tailored models in the Model Layer enable a MAS to effectively address challenges where generic approaches fail.

Examples include EthonAI’s graph algorithms for root cause analysis, LLMs trained on manufacturing queries, and specialized computer vision models for quality inspection. 

Application Layer

The Application Layer contains the user-facing applications that leverage the data processed by the underlying layers. It is a no-code/low-code environment where users engage with the relevant information in tools that are custom-built for manufacturing workflows. This layer is designed to be intuitive for a large set of user personas and to directly integrate into their workflows.

Examples include EthonAI’s Analyst, Inspector, Observer, Tracker, and Miner software.

How will the Manufacturing Analytics System change the industry?

Over the past five years, industry leaders have significantly improved their data acquisition capabilities. Over the next five years, we expect the mid-market segment will catch up. This opens the door to unprecedented productivity levels in the industry. But to really capitalize on their data assets, manufacturers need stronger and widespread analytics. Manufacturing Analytics Systems will quickly become essential for this purpose.

A MAS offers manufacturers advanced analytical capabilities and user-centric applications to improve operational decision-making. It equips manufacturers with the right set of tools to operate as effectively as lighthouse factories. The MAS is not just a technological advancement; it represents a paradigm shift in manufacturing. It commoditizes access to advanced analytics for the wealth of manufacturing data and enables widespread improvement of operations.

EthonAI is building the applications and infrastructure to lead this transformation. Explore the Use Cases on our website to discover how EthonAI’s MAS is already used by industry leaders to cut production losses and achieve operational excellence today.