MLOps Roadmap
Taking ML models from notebooks to production. The engineering side of ML.
ML system design principles
The gap between “model works in notebook” and “model works in production” is where most ML projects die. Key principles:
- Reproducibility: any result must be recreatable. Version code, data, config, and environment together
- Testability: test data assumptions, model behavior, and infrastructure — not just unit tests
- Modularity: separate data loading, preprocessing, training, and serving so each can change independently
- Automation: if a human does it twice, automate it. Manual steps are where errors live
Topics
- Experiment Tracking — log runs, compare results, reproduce experiments
- Model Serving — deploy models as APIs
- ML Pipelines — automate data → train → evaluate → deploy
- Model Monitoring — detect drift, degradation, failures in production
- Feature Stores — centralized feature management for training and serving
Key tools
| Category | Tools |
|---|---|
| Experiment tracking | MLflow, Weights & Biases, TensorBoard |
| Model serving | FastAPI, TorchServe, Triton, ONNX Runtime |
| Pipelines | Airflow, Prefect, Kubeflow, Dagster |
| Feature stores | Feast, Tecton, Hopsworks |
| Data versioning | DVC |
| Containerization | Docker |
The progression
- Jupyter notebook (exploration)
- Python script (reproducible)
- Experiment tracking (comparable)
- Docker container (portable)
- API endpoint (accessible)
- Monitoring + retraining (reliable)
Maturity levels
| Level | What you have | What’s missing |
|---|---|---|
| 0 | Notebooks, manual everything | Reproducibility, automation |
| 1 | Scripts + experiment tracking | CI/CD, monitoring |
| 2 | Automated pipelines + model registry | Drift detection, feature stores |
| 3 | Full CI/CD, monitoring, auto-retraining | You’re doing well |
Most teams are at level 0-1. Getting to level 2 solves 90% of production pain.
Technical debt in ML
Google’s “Hidden Technical Debt in Machine Learning Systems” (2015) identified that ML code is a tiny fraction of a real ML system. The rest:
- Data collection, cleaning, validation — the largest time sink
- Feature extraction and management — duplicated across teams without feature stores
- Configuration and pipeline glue — the fragile connective tissue
- Monitoring and testing — often bolted on as an afterthought
The paper’s key insight: ML systems have all the maintenance problems of traditional software, plus a set of ML-specific issues (data dependencies, feedback loops, entanglement between features).
Reading list
- “Hidden Technical Debt in Machine Learning Systems” (Sculley et al., 2015) — the foundational paper
- “Rules of Machine Learning” (Martin Zinkevich, Google) — practical engineering wisdom
- “Reliable Machine Learning” (Cathy Chen et al., O’Reilly) — production ML systems
- “Designing Machine Learning Systems” (Chip Huyen) — end-to-end ML system design
Links
- Machine Learning Roadmap — the ML side of things
- Projects Index — hands-on practice