MLOps Roadmap

Taking ML models from notebooks to production. The engineering side of ML.

ML system design principles

The gap between “model works in notebook” and “model works in production” is where most ML projects die. Key principles:

Reproducibility: any result must be recreatable. Version code, data, config, and environment together
Testability: test data assumptions, model behavior, and infrastructure — not just unit tests
Modularity: separate data loading, preprocessing, training, and serving so each can change independently
Automation: if a human does it twice, automate it. Manual steps are where errors live

Topics

Experiment Tracking — log runs, compare results, reproduce experiments
Model Serving — deploy models as APIs
ML Pipelines — automate data → train → evaluate → deploy
Model Monitoring — detect drift, degradation, failures in production
Feature Stores — centralized feature management for training and serving

Key tools

Category	Tools
Experiment tracking	MLflow, Weights & Biases, TensorBoard
Model serving	FastAPI, TorchServe, Triton, ONNX Runtime
Pipelines	Airflow, Prefect, Kubeflow, Dagster
Feature stores	Feast, Tecton, Hopsworks
Data versioning	DVC
Containerization	Docker

The progression

Jupyter notebook (exploration)
Python script (reproducible)
Experiment tracking (comparable)
Docker container (portable)
API endpoint (accessible)
Monitoring + retraining (reliable)

Maturity levels

Level	What you have	What’s missing
0	Notebooks, manual everything	Reproducibility, automation
1	Scripts + experiment tracking	CI/CD, monitoring
2	Automated pipelines + model registry	Drift detection, feature stores
3	Full CI/CD, monitoring, auto-retraining	You’re doing well

Most teams are at level 0-1. Getting to level 2 solves 90% of production pain.

Technical debt in ML

Google’s “Hidden Technical Debt in Machine Learning Systems” (2015) identified that ML code is a tiny fraction of a real ML system. The rest:

Data collection, cleaning, validation — the largest time sink
Feature extraction and management — duplicated across teams without feature stores
Configuration and pipeline glue — the fragile connective tissue
Monitoring and testing — often bolted on as an afterthought

The paper’s key insight: ML systems have all the maintenance problems of traditional software, plus a set of ML-specific issues (data dependencies, feedback loops, entanglement between features).

Reading list

“Hidden Technical Debt in Machine Learning Systems” (Sculley et al., 2015) — the foundational paper
“Rules of Machine Learning” (Martin Zinkevich, Google) — practical engineering wisdom
“Reliable Machine Learning” (Cathy Chen et al., O’Reilly) — production ML systems
“Designing Machine Learning Systems” (Chip Huyen) — end-to-end ML system design

AI/ML Notes

Explorer

MLOps Roadmap

MLOps Roadmap

ML system design principles

Topics

Key tools

The progression

Maturity levels

Technical debt in ML

Reading list

Links

Graph View

Table of Contents

Backlinks