Machine Learning Roadmap
What is ML?
Programs that learn patterns from data instead of being explicitly programmed.
Traditional: rules + data → answers
ML: data + answers → rules (a model)
Core concepts
- Supervised vs Unsupervised Learning — the two main paradigms
- Bias-Variance Tradeoff — underfitting vs overfitting
- Loss Functions — what models optimize
- Cross-Validation — robust evaluation
- Evaluation Metrics — accuracy, precision, recall, F1, AUC
- Hyperparameter Tuning — finding the best model settings
- Regularization — preventing overfitting
Supervised Learning — Regression
- Linear Regression — the simplest model, the foundation
- Polynomial Regression — fitting curves
Supervised Learning — Classification
- Logistic Regression — linear classification, probability outputs
- Decision Trees — interpretable, rule-based
- Random Forests — ensemble of trees, robust
- Gradient Boosting — XGBoost, LightGBM — often wins competitions
- Support Vector Machines — maximum margin classifiers
- K-Nearest Neighbors — classify by similar examples
- Naive Bayes — probabilistic, fast, good for text
Unsupervised Learning
- K-Means Clustering — partition data into k groups
- PCA — dimensionality reduction, find principal directions
- Anomaly Detection — find unusual data points
Learning order
- Linear Regression → understand loss, gradient descent, evaluation
- Logistic Regression → extend to classification
- Decision Trees → understand non-linear models
- Random Forests + Gradient Boosting → practical workhorse models
- Evaluation + Cross-Validation → proper methodology
- Unsupervised → clustering, PCA when needed
Links
- Data Fundamentals Roadmap — data preparation feeds into modeling
- Deep Learning Roadmap — when classical ML isn’t enough