ANOVA

ANOVA helps Lean Six Sigma teams test whether process settings, machines, suppliers, shifts, materials, or treatments produce statistically meaningful differences in a continuous response.

Back to BoK Index

MetricMeasurementDecision Support

Definition

ANOVA, or Analysis of Variance, is a statistical method for comparing the means of three or more groups by separating observed variation into sources. It tests whether differences between group averages are large relative to the variation within groups. The result is usually expressed through an F statistic and a p-value.

ANOVA does not prove which practical action should be taken by itself. It helps determine whether a factor appears to have a statistically detectable effect on a continuous response. Teams must also consider practical significance, measurement quality, process knowledge, confidence intervals, residual behavior, and follow-up comparisons.

History

ANOVA is closely associated with the development of modern experimental design and statistical inference. It became a foundational method for agricultural experiments, industrial experiments, quality engineering, and scientific research because it allowed analysts to study multiple groups and factors more efficiently than repeated two-sample comparisons.

In Six Sigma and quality engineering, ANOVA is commonly used in Analyze and Improve phases, especially with designed experiments, process comparison studies, supplier comparisons, machine studies, and training or method evaluations. It remains a core method because improvement teams often need to know whether observed differences are real enough to act on.

When to Use

Use ANOVA when the response is continuous and the team needs to compare means across three or more groups or across levels of one or more factors. Examples include comparing cycle time by shift, tensile strength by supplier, defect size by machine, transaction time by method, yield by process setting, or training score by instruction approach.

Do not use ordinary ANOVA blindly when the response is attribute data, the groups are not independent, the data is strongly nonnormal with small samples, variances are severely unequal, or the measurement system is not acceptable. In those cases, consider transformations, nonparametric methods, generalized linear models, attribute analysis, or better data collection.

Step-by-Step

Define the question. State the response variable, factor or factors, factor levels, comparison groups, and practical decision to be made.
Plan the data collection. Use clear operational definitions, balanced sampling where practical, randomization when possible, and adequate sample size for the expected effect.
Confirm measurement quality. Review gage R&R, attribute agreement, calibration, or data capture logic before trusting group differences.
Check assumptions. Review independence, residual patterns, approximate normality of residuals, and equality of variance. ANOVA is often robust, but severe violations can mislead conclusions.
Run the ANOVA. Calculate the factor effect, within-group variation, F statistic, p-value, and percent contribution or effect size where appropriate.
Interpret statistical and practical significance. A low p-value indicates evidence of a group difference, but the team must decide whether the difference matters operationally.
Perform follow-up comparisons. If the overall test is significant, use planned contrasts or multiple-comparison methods to identify which groups differ.
Validate with process knowledge. Confirm findings at the process, especially before changing settings, suppliers, standards, or control limits.
Translate into action. Update process settings, standard work, control plans, training, supplier decisions, or future experiments based on the evidence.

Examples

Machine comparison: A quality engineer compares part diameter across four machines. ANOVA shows a significant machine effect, and follow-up comparisons identify one machine running high. Maintenance and offset adjustment are then verified with capability data.
Supplier material study: A team compares tensile strength from three suppliers. The ANOVA indicates a meaningful supplier effect, but one supplier also has larger variation, prompting both mean and variance investigation.
Training method evaluation: A service leader compares error rates after three training methods using a continuous assessment score. ANOVA helps determine whether one method produces higher knowledge retention.
Process setting experiment: A DOE evaluates temperature and pressure effects on yield. ANOVA identifies significant main effects and an interaction, leading the team to choose settings that improve yield without increasing variation.
Shift comparison: A supervisor compares average changeover time across shifts. ANOVA shows a shift difference, and process observation reveals different preparation practices.

Common Pitfalls

Using repeated t-tests instead of ANOVA. Multiple pairwise tests increase the chance of false positives unless error rates are controlled.
Ignoring practical significance. A statistically significant difference may be too small to matter operationally, especially with large samples.
Poor sampling discipline. Biased samples, mixed product families, time trends, and unrecorded process changes can create misleading differences.
Skipping residual checks. Residual plots often reveal nonconstant variance, nonnormal behavior, outliers, or missed structure.
Confusing association with cause. ANOVA on observational data can show group differences, but controlled experiments provide stronger causal evidence.
No follow-up after a significant result. The overall ANOVA indicates that at least one mean differs. It does not automatically identify which one.
Ignoring measurement variation. If the measurement system is weak, ANOVA can detect measurement noise rather than process differences.