ML in Production: The Hard Parts

Context and problem statement

Many machine learning models look accurate in notebooks but underperform or fail once integrated into real systems. The root cause is rarely “the algorithm” alone; production ML is a socio-technical system that must reliably convert changing, messy operational data into predictions, at required latency and scale, with traceability, controls, and continuous feedback. A practical way to frame the “hard parts” is to treat a model as part of a governed data product: it has inputs (features), transformations, an interface (serving), quality expectations (SLOs), and lifecycle management (change control).

Core concepts and definitions (production-focused)

Training vs. serving: Training typically uses historical datasets, while serving consumes live events or operational records. A model that performs well offline can fail online due to differences in data availability, timing, or preprocessing.
Training-serving skew: Any mismatch between how features are computed in training and in production (different logic, different reference data, different time windows, missing values handled differently).
Data drift and concept drift:
- Data drift (covariate shift) is a change in the distribution of inputs.
- Concept drift is a change in the relationship between inputs and outcomes. Both can degrade performance and should be explicitly monitored.
Point-in-time correctness: When building training sets from historical data, feature values must reflect what was known at prediction time (preventing target leakage and backfill bias).
Model artifact and lineage: The model binary plus its dependencies (code, parameters, training data version, feature definitions, environment) required for reproducibility and auditability.
Operational SLOs: Latency, throughput, availability, and error budgets for prediction services.

Lifecycle and governance foundations

A production ML system spans multiple disciplines that are covered by established practices:

Lifecycle management (MLOps/DevOps principles): Treat ML assets (code, data, features, models) as versioned, testable, deployable units with automated pipelines and controlled releases.
Data management (DAMA-DMBOK): Apply data governance, data quality management, metadata management, and master/reference data practices to the data powering the model.
Architecture and integration (TOGAF): Define clear architecture building blocks and interfaces across data sources, feature pipelines, serving, monitoring, and downstream applications.
Analytics/engineering lifecycle alignment (ADLC-style thinking): Manage requirements, development, testing, deployment, and operations as a continuous lifecycle rather than a one-time handoff.