Data Collection Plan

A Data Collection Plan defines what data will be collected, why it is needed, how it will be measured, who will collect it, where it will come from, and how it will support a process decision.

Back to BoK Index

MetricMeasurementDecision Support

Definition

A Data Collection Plan is a structured plan for gathering decision-useful data. It specifies the measure, operational definition, data source, sampling method, collection frequency, owner, recording method, stratification factors, measurement system needs, and intended analysis.

The plan prevents teams from collecting convenient data that does not answer the project question. In Six Sigma, it is especially important during Measure and Analyze because poor data collection can invalidate the entire project.

History

Data collection planning comes from scientific method, quality control, industrial statistics, and structured problem solving. As quality improvement matured, teams learned that data credibility depends on planning before measurement, not cleanup after collection.

Six Sigma formalized the data collection plan as a common project deliverable. It connects the project charter, CTQs, process map, measurement system analysis, sampling, and planned statistical analysis.

When to Use

Use a data collection plan whenever a team needs evidence for baseline performance, root cause analysis, capability, hypothesis testing, control charts, customer requirements, or improvement validation. It is useful for DMAIC projects, audits, complaints, experiments, supplier studies, service-process analysis, and management dashboards.

It is most valuable before data collection begins. Retrofitting definitions and sampling logic after data is gathered is usually weaker, slower, and harder to defend.

Step-by-Step

State the decision. Define what question the data must answer and what action may follow.
Define the measure. Specify the metric, unit, defect definition, start and stop points, and calculation method.
Identify the population and scope. Clarify products, services, customers, locations, time period, shifts, and process boundaries.
Select sampling method and size. Decide whether to sample randomly, systematically, stratified, or continuously based on risk and analysis needs.
Define collection method. Identify data source, form, system query, observation method, gauge, appraiser, or automated capture.
Check measurement system quality. Plan MSA, Gage R&R, attribute agreement, calibration, or data-entry checks where needed.
Plan stratification. Capture factors such as shift, product, supplier, machine, customer type, defect code, and operator when they may explain variation.
Assign ownership and timing. State who collects, verifies, stores, and reviews the data.
Pilot the plan. Test a small collection run, correct unclear definitions, and confirm the data can support the intended analysis.
Document and control changes. Keep the plan updated if scope, definitions, or collection methods change.

Examples

Scrap project: A team defines defect categories, part family, machine, shift, operator, material lot, and disposition codes before measuring scrap drivers.
Service cycle time: The plan defines request received time, work start time, completion time, queue time, rework flag, and order type.
Supplier quality: A study specifies incoming inspection sample size, defect definition, lot traceability, supplier code, and measurement method.
Experiment preparation: A DOE plan defines response variables, factor settings, run order, measurement frequency, and data-recording format.
Control phase: A process owner defines who records control-chart data, how often, and what reaction plan applies to signals.

Common Pitfalls

No operational definition. If collectors interpret the measure differently, the data cannot be trusted.
Collecting too much data. Extra fields create burden unless they support a real analysis or stratification need.
Sampling bias. Convenient samples may exclude night shift, difficult jobs, high-risk products, or peak demand.
Ignoring measurement error. Gauge, appraiser, system, and data-entry variation can overwhelm the signal.
Changing definitions midstream. Uncontrolled changes can make before-and-after comparisons invalid.
No plan for analysis. Data should be collected because it will answer a decision, not because it might be useful someday.