Data Modeling Basics

Introduction: why data modeling matters

Data modeling is the discipline of defining how an organization’s data is structured, related, and constrained so it can be stored, integrated, and used reliably. In DAMA-DMBOK terms, data modeling is a core activity within Data Architecture and Data Modeling & Design, and it directly supports downstream capabilities such as data integration, governance, metadata management, analytics, and data quality. A “good” data model reduces ambiguity (shared definitions), improves interoperability (consistent keys and relationships), and enables scalable analytics (clear facts, dimensions, and grain). Poor modeling typically shows up later as reconciliation issues, duplicated metrics, brittle pipelines, and low trust in reporting.

Core concepts and definitions

A practical foundation starts with shared terminology.

Entity: a thing the business cares about (Customer, Order, Product).
Attribute: a property of an entity (Customer Email, Order Date).
Relationship: how entities connect (Customer places Order).
Key: an attribute (or set) that uniquely identifies a record.
Cardinality: relationship rules (one-to-many, many-to-many).
Grain: what one row represents (one row per order line, per daily product snapshot, etc.).
Business definition: the agreed meaning of a concept (e.g., “Active Customer”). Modeling is not only about database tables. It also includes the semantic meaning of data (definitions, allowable values, and rules) so that analytics and operational use cases interpret data consistently.

Levels of modeling: conceptual, logical, physical

Most established modeling practices separate work into levels, which helps stakeholders review the model at the right depth.

Conceptual data model: business-facing overview of major entities and relationships. It is used to align stakeholders and scope domains without implementation details.
Logical data model: detailed representation of entities, attributes, keys, and relationships independent of any specific technology. It introduces normalization decisions and business rules more explicitly.
Physical data model: implementation-specific design (tables, columns, data types, indexes, partitions, constraints). It reflects the chosen platform (e.g., Postgres, Snowflake, BigQuery) and performance considerations. Keeping these levels distinct improves reviewability and governance: business stakeholders can validate meaning at the conceptual/logical levels, while data engineers and DBAs optimize the physical layer.