Virtual Design of Experiment: How to optimize your production processes through digital tools

by Dr. Tobias Hatt

The article compares traditional and virtual Design of Experiments in manufacturing. It emphasizes the efficiency of virtual Design of Experiments in optimizing production processes using causal AI, while highlighting the challenges in data collection and advanced statistical software required for their effective implementation.

What is this article about?

Production processes are getting increasingly complex and, as such, it is difficult to make sure that they run optimally. To understand and optimize their production processes, manufacturers often turn to so-called “Design of Experiments” (DOEs), where they systematically test different parameter settings against each other. While DOEs can provide valuable insights to improve production processes, they are also time consuming and costly. Hence, DOEs are typically conducted infrequently and focus only on a subset of the production parameters. Such incomplete optimizations often lead to suboptimal settings.

In this article, we explore a possible solution: Virtual Design of Experiments. Virtual DOEs overcome the major drawbacks of DOEs by building a digital twin of the production process. As such, virtual DOEs allow for a comprehensive optimization in a cheaper and faster way compared to traditional DOEs. This allows them to be run more frequently and results in fully optimized production processes. In the following, we will discuss what DOEs are, how virtual DOEs work, and the challenges related to virtual DOEs.

What is a DOE?

DOE is a statistical method designed to experimentally assess how specific parameters influence outcomes in manufacturing processes. Its origins date back to Ronald Fisher in the 1920s, initially for agricultural applications. In the 1980s, Genichi Taguchi’s methods notably advanced its use in manufacturing. However, DOE’s full potential remains underexploited in the industry, especially outside of sectors like pharmaceuticals and semiconductors. This is often because of the significant time and effort required to master DOEs, which combine statistical know-how with domain-specific knowledge (for an in-depth overview of DOEs, we refer interested readers to Michel Baudin’s blog).

Let’s consider an example, where we investigate the influence of oven temperature on the final quality in a cake factory. The goal is to find the temperature, which results in the best cake quality. To this end, we could use a DOE to investigate the influence of the oven temperature on the cake quality. The core principle is to keep all parameters in the production process constant, while only changing the temperature and observe the quality outcome. For instance, we could bake cakes at two different temperature settings; that is, Setting A = 170°C and Setting B = 180°C using the exact same ingredients. Then, we compare the resulting cake quality for both temperature settings. If the quality increases when we change the temperature from 170°C to 180°C, we have found a strong indication of a better temperature setting and keep it for further experimentation. 

Although the results of our DOE suggest that 180°C is superior to 170°C, it could also be that 190°C is even better than 180°C. Hence, in order to find the best setting for the temperature, we have to iteratively run multiple DOEs. Once we have run multiple DOEs and optimized one parameter, we may want to continue optimizing other parameters as well (e.g., the ingredients). As such, we will have to run multiple DOEs for different parameter combinations, which quickly becomes time consuming and costly. Not only because running a DOE requires planning, but also because the new setting may actually be worse than the old one, which leads to more quality losses.

A potential solution are virtual DOEs, which leverage data collected throughout the production process to “virtually” run DOEs. This not only costs less money, but is also typically faster.

What is a Virtual DOE?

Virtual DOEs have only recently become feasible because of the vast amount of data collected in production processes and advances in statistical methods. Virtual DOEs have the same goal as traditional DOEs: understand how production parameters influence the quality and find the optimal setting for those parameters. Different from traditional DOEs, virtual DOEs do not take place in the actual production process, but virtually in a software. Therefore, there is no need to change the actual production process as the insights are gained by simulating the changes of parameters virtually. This scales better to a large number of parameters and, most importantly, doesn’t require changing the actual production process, which circumvents the risk of reducing the products’ quality.

In a nutshell, when conducting virtual DOEs, we take all available production data to build a digital twin of the production process. Such a virtualization allows us to run virtual experiments with different simulated temperature settings and optimize the parameter for the best quality results.

Challenges of Virtual DOEs

In order to conduct virtual DOEs, there are two major challenges: (1) collecting the right data of the production process and (2) using the right statistical software to build the environment that allows us to virtually emulate the production process. 

While manufacturers already record large amounts of data, it may not always be the data needed for virtual DOEs. In order to run virtual DOEs, we need sufficient variation in production parameters to build a digital twin of a production process. Hence, when collecting data, one should consult process engineers to identify relevant data. This makes setting up virtual DOEs an interdisciplinary initiative, which requires IT experts to work closely with process engineers.

Moreover, the software needed to build virtual DOEs is statistically complex, because it has to make use of causal simulation. Causal simulation requires quantifying the impact of specific changes in production parameters on the final quality output, while controlling for a myriad of confounding variables that could skew results. Furthermore, it must be capable of handling large datasets with varying degrees of variability and correlation to ensure that the virtual experiments closely mimic real-world scenarios. Only recently, statistical methods with these capabilities have transitioned from research to practical applications.


DOE is an important tool for manufacturing companies to understand and optimize their production processes. However, they are time consuming and costly as they have to be conducted in the actual production process. Virtual DOEs are DOEs that run virtually in a software and do not interfere with the actual production process. They can save a lot of time and money, but are non-trivial as they rely on the data collected from the production process and advanced statistical software. Hence, enabling manufacturing companies to run virtual DOEs requires the right choice of data and software.

Dr. Tobias Hatt

Tobias Hatt is a Machine Learning Engineer at EthonAI, where he works on root cause analysis. He is particularly interested in applications of causal machine learning that help optimizing manufacturing processes. Tobias holds a PhD from ETH Zurich, where his research focused on causal machine learning.