Understanding Data Types

Introduction

Data types define how data is represented, stored, validated, and processed across databases, pipelines, and analytics tools. In practice, “data type” spans both technical types (e.g., DATE, DECIMAL, VARCHAR) and semantic types (e.g., “Customer ID”, “Order Amount”, “Consent Flag”). Clear and consistent data typing reduces integration friction, improves data quality controls, and makes metrics easier to interpret.

What “data type” means (technical vs. semantic)

A complete understanding of data types requires two complementary views:

Technical (physical) data types: How a system stores and computes values (SQL types, file encodings, precision/scale, collation, nullability).
Semantic (business) data types / domains: What a value means and the rules it must follow (allowed values, format constraints, reference data, sensitivity classification). In DAMA-DMBOK terms, technical types are part of Data Modeling & Design and Data Architecture, while semantic definitions belong in Metadata Management (business glossary, data dictionary) and are enforced through Data Quality controls.

Core categories of data types

Numeric types

Numeric typing choices affect accuracy, performance, and downstream aggregations.

Integers (INT, BIGINT): Counts, surrogate keys, discrete quantities.
Exact decimals (DECIMAL(p,s) / NUMERIC): Currency and other values requiring exact arithmetic (avoid binary floating-point rounding).
Approximate numbers (FLOAT, DOUBLE): Scientific/measurement data where small rounding error is acceptable. Common governance practice is to define standard numeric domains (e.g., amount_currency as DECIMAL(18,2)) and apply them consistently.

Text and code types

Text types are often used for names, descriptions, and codes.

Free text: Customer names, comments; requires collation/encoding standards (e.g., UTF-8) and often needs normalization.
Codes/identifiers: Product codes, ISO country codes; should be modeled as domains with explicit allowed patterns and reference lists. A frequent modeling improvement is to differentiate “identifier-like text” (stable, constrained) from “free text” (unconstrained) in the semantic layer and documentation.

Date and time types

Time is a common source of defects when types and rules are unclear.