Data Warehouses Explained

Understanding the purpose of a data warehouse

A data warehouse is a dedicated analytical data store used to support business intelligence (BI), reporting, and decision-making across an organization. It exists to separate analytical workloads from operational workloads, integrate data across multiple systems, and retain history in a consistent structure that business users can query reliably. Operational databases are typically designed for Online Transaction Processing (OLTP): high-concurrency inserts/updates, strict transaction integrity, and predictable access patterns. Analytical workloads (Online Analytical Processing, or OLAP) behave differently: large scans, wide aggregations, and complex joins across domains. Running OLAP workloads directly on OLTP systems often creates contention (CPU, memory, locks, I/O), impacts customer-facing processes, and encourages teams to create disconnected “shadow” datasets.

What “data warehouse” means (core definitions)

A widely cited foundational definition (from classic enterprise data warehousing literature) describes a data warehouse as:

Subject-oriented: organized around key business domains (customers, products, orders), not around application tables
Integrated: data is standardized across sources (consistent identifiers, naming, units, reference data)
Time-variant: history is preserved so trends can be analyzed over time
Non-volatile: data is primarily loaded and queried; it is not continually overwritten as in operational systems In practice, modern platforms implement these principles in different ways (cloud data warehouses, lakehouses, and hybrid architectures), but the intent remains the same: provide a trusted analytical foundation with consistent definitions and historical context.

How a data warehouse fits into a modern data architecture

A warehouse is not only a database technology; it is a managed analytical system with architecture, governance, and operating processes. From an enterprise architecture perspective (aligned with TOGAF thinking), a warehouse typically includes well-defined building blocks and interfaces:

Source systems: operational applications, SaaS platforms, files, event streams
Ingestion layer: batch loads, change data capture (CDC), and/or streaming pipelines
Landing/staging (raw) layer: immutable or minimally transformed data for traceability and replay
Transformation layer: standardization, enrichment, business rules, and quality controls (ETL/ELT)
Curated/modelled layer: schemas optimized for analytics (dimensional models, Data Vault, or curated 3NF)