Design of Experiments (DOE)

Design of Experiments (DOE) is a planned statistical method for testing factors, interactions, and settings so teams can understand cause-and-effect relationships and optimize process or design performance.

Back to BoK Index

MetricStatisticsDecision Support

Definition

Design of Experiments (DOE) is a structured approach for planning, running, and analyzing experiments where input factors are deliberately changed to study their effect on one or more response variables. DOE helps teams identify important factors, interactions, curvature, robust settings, and tradeoffs more efficiently than one-factor-at-a-time testing.

DOE can be used for manufacturing processes, product design, service processes, software settings, reliability studies, and improvement projects. Common designs include screening designs, factorial designs, fractional factorial designs, response surface designs, mixture designs, and Taguchi-style robust design experiments.

History

DOE developed from statistical work by Ronald Fisher and later expanded through industrial statistics, quality engineering, and Six Sigma practice. It became a core improvement method because many process problems are driven by interactions that simple comparisons or isolated trials do not reveal.

In quality engineering, DOE is closely tied to robust design, Taguchi methods, response surface methodology, process optimization, and Design for Six Sigma. Modern software has made analysis easier, but sound experimental planning is still the main determinant of useful results.

When to Use

Use DOE when a team needs to understand which inputs affect an output, optimize settings, reduce variation, validate suspected causes, improve yield, study interactions, or make a design robust to noise. DOE is valuable when changes can be controlled and the cost of random trial-and-error is high.

DOE is not the first step for every problem. Teams should first clarify the problem, verify the measurement system, stabilize obvious special causes, and use process knowledge to select practical factors and ranges.

Step-by-Step

Define the objective. State whether the experiment is for screening, optimization, robustness, confirmation, or learning.
Select responses. Define measurable outputs such as yield, dimension, strength, cycle time, defect rate, cost, or customer performance.
Verify measurement systems. Confirm the response data can be trusted before experimenting.
Choose factors and levels. Select controllable inputs, practical ranges, noise factors, and constraints with subject-matter experts.
Select the design. Choose full factorial, fractional factorial, response surface, mixture, Taguchi, or another design based on objective and resources.
Randomize and block where appropriate. Protect the study from time, lot, shift, machine, material, or environmental bias.
Run the experiment carefully. Follow the run order, record actual settings, document abnormalities, and protect safety and quality.
Analyze effects and interactions. Use plots, regression, ANOVA, residual checks, and practical significance, not p-values alone.
Confirm the result. Run confirmation trials at selected settings and verify performance in real operating conditions.
Standardize and control. Update settings, standard work, control plans, monitoring, and training.

Examples

Injection molding: A DOE studies temperature, pressure, cooling time, and material lot to reduce warpage and improve capability.
Welding: A team tests current, travel speed, and shielding gas flow to improve strength while reducing spatter.
Service process: A call-center team tests routing rules, script structure, and knowledge-base prompts to improve first-contact resolution.
Product design: Engineers use a response surface design to optimize strength and weight across material thickness and geometry.
Robust design: A team tests settings across noise conditions to find a process window that performs consistently.

Common Pitfalls

Changing one factor at a time. OFAT testing misses interactions and can require more runs for less learning.
Poor factor ranges. Ranges that are too narrow hide effects; ranges that are unsafe or unrealistic create risk.
Ignoring measurement error. If the measurement system is weak, DOE conclusions can be false.
No randomization or blocking. Time, lot, and sequence effects can masquerade as factor effects.
Overfitting the model. Statistical significance must be checked against residuals, practical effect size, and confirmation runs.
Skipping sustainment. Optimized settings must become controlled operating standards.