Understanding Data Quality: Beyond Completeness and Accuracy

Why data quality is more than “clean data”

Data quality is the degree to which data is fit for its intended use. In DAMA-DMBOK terms, data quality management is a core data management discipline that defines, measures, monitors, and improves data to meet business expectations. Poor quality data typically shows up as:

Incorrect decisions (e.g., wrong KPIs, biased model features)
Operational failures (e.g., failed order fulfillment due to invalid addresses)
Higher costs (e.g., rework, manual reconciliation, duplicate outreach)
Loss of trust in analytics and self-service A practical definition of “good” data therefore must be measurable and explicitly tied to a use case (reporting, operational processing, ML, compliance), not assumed.

Core dimensions of data quality (and how to operationalize them)

Many organizations use a set of commonly accepted dimensions to express requirements and design controls. The six dimensions below are widely used in governance and data quality practices and map well to how rules and metrics are implemented in real systems.

Accuracy

Definition: Data correctly represents the real-world entity/event it describes. How it fails: wrong amounts, wrong customer attributes, incorrect timestamps, incorrect mappings. How to measure: compare to an authoritative source (system of record, external validation, reconciliation); calculate error rate and impact. Common controls: reconciliations, reference data validation, controlled vocabularies, master data management (MDM) where appropriate.

Completeness

Definition: Required data is present at the right level of granularity for the use case. How it fails: nulls in required fields, missing records, partial history after a pipeline outage. How to measure: null rate for required fields; record counts vs. expected; completeness by segment/time window. Common controls: required-field checks, ingestion expectations (e.g., “daily file must contain all regions”), backfills with auditable lineage.

Consistency

Definition: Data does not contradict itself across datasets, systems, or time. How it fails: customer status differs between CRM and billing; metric definitions differ between dashboards; different currencies without conversion. How to measure: cross-system reconciliation; referential integrity checks; “same business concept, same definition” checks. Common controls: canonical definitions in a semantic layer/metrics layer; conformed dimensions (Kimball); standardized transformation logic.

Timeliness

Definition: Data is available when needed and reflects the required recency for the use case. late-arriving feeds; pipelines succeed but deliver after reporting deadlines; operational actions happen on stale data. freshness/latency (event time → availability time); SLA/SLO compliance. pipeline SLAs, alerting on freshness, late data handling patterns (watermarks, reprocessing windows).