Hypothesis Testing

Hypothesis Testing uses sample data to evaluate whether observed differences, changes, or relationships are likely to reflect real process effects rather than random variation.

Back to BoK Index

StatisticsDecision SupportSix Sigma

Definition

Hypothesis Testing is a statistical decision method used to compare data against a claim, baseline, target, or another group. It frames a null hypothesis, an alternative hypothesis, a significance level, and a test statistic to judge whether evidence is strong enough to reject the null.

In improvement work, hypothesis tests help teams evaluate before-and-after changes, supplier differences, shift differences, defect patterns, customer segments, and potential root causes.

History

Hypothesis testing developed through statistical inference and experimental science. Quality engineering adopted it because process teams often need to separate signal from random variation.

Six Sigma uses hypothesis testing heavily in Analyze and Improve phases, but good practitioners combine statistical evidence with process knowledge and practical significance.

When to Use

Use hypothesis testing when comparing means, medians, proportions, variances, counts, or relationships. It is useful for validating root causes, confirming improvement, comparing suppliers, evaluating experiments, and testing claims about process behavior.

Do not use it as a mechanical substitute for clear problem definition, measurement-system validation, representative sampling, or graphical analysis.

Step-by-Step

Define the practical question and decision.
Identify the response variable and data type.
State null and alternative hypotheses.
Choose the correct test for data type, design, assumptions, and sample structure.
Check data quality, independence, sample size, and assumptions.
Run the test and review p-value, confidence interval, and effect size.
Interpret practical significance, not just statistical significance.
Document assumptions, conclusion, and action.

Examples

Before/after: A t-test evaluates whether cycle time changed after standard work was introduced.
Defect rate: A two-proportion test compares supplier defect rates.
Multiple groups: ANOVA compares mean strength across three process settings.
Categorical pattern: Chi-square testing checks whether defect type depends on shift.

Common Pitfalls

Testing dirty or biased data.
Ignoring practical significance.
Using the wrong test for paired, independent, normal, non-normal, or categorical data.
Running many tests without a plan.
Confusing failure to reject with proof of no difference.
Skipping process understanding after a significant result.