Computer Vision Roadmap

Teaching machines to understand images and video.

Topics

Core tasks

Image Classification — what is in this image?
Object Detection — where are the objects? (bounding boxes)
Image Segmentation — pixel-level classification
Image Generation — creating new images (GANs, diffusion)

Motion and tracking

Multi-Object Tracking — track objects across video frames with persistent IDs
Optical Flow — pixel-level motion estimation between consecutive frames
Video Understanding — temporal analysis: action recognition, anomaly detection

3D and depth

3D Vision and Depth — stereo vision, monocular depth, point clouds, SfM
Tutorial - Visual SLAM Concepts — simultaneous localization and mapping

Human understanding

Pose Estimation — detect body keypoints, skeleton-based action recognition

Techniques

Convolutional Neural Networks — the architecture for vision
Transfer Learning — pretrained models (ResNet, EfficientNet, ViT)
Data Augmentation — expand training data artificially

Learning order

Phase 1: Foundations
  1. Convolutional Neural Networks (how vision models work)
  2. Image Classification (the "hello world")
  3. Transfer Learning (pretrained models)

Phase 2: Spatial tasks
  4. Object Detection (bounding boxes)
  5. Image Segmentation (pixel-level)
  6. Pose Estimation (body keypoints)

Phase 3: Temporal tasks
  7. Optical Flow (pixel motion)
  8. Multi-Object Tracking (identity across frames)
  9. Video Understanding (what's happening over time)

Phase 4: 3D understanding
  10. 3D Vision and Depth (depth estimation, point clouds)
  11. Visual SLAM Concepts (localization + mapping)

Phase 5: Applied
  12. Tutorial - Object Tracking Pipeline (build a tracker)
  13. Tutorial - Aerial Image Analysis (satellite/drone imagery)
  14. Case Study - CV Pipeline Design (design judgment)

The modern workflow

1. Pick a pretrained model (torchvision, timm, huggingface)
2. Replace the classification head
3. Apply data augmentation
4. Fine-tune on your data
5. Evaluate and iterate

You almost never train a vision model from scratch.

Tutorials and applied

Tutorial - Object Tracking Pipeline — build a multi-object tracker from scratch
Tutorial - Aerial Image Analysis — satellite imagery, change detection, NDVI
Tutorial - Visual SLAM Concepts — visual odometry and mapping

Design judgment

Case Study - CV Pipeline Design — three real scenarios: drone tracking, satellite change detection, indoor activity monitoring

Key libraries

torchvision — datasets, models, transforms
timm — huge collection of pretrained models
albumentations — fast image augmentation
ultralytics — YOLO for detection and tracking
opencv-python — classical CV, optical flow, feature matching
open3d — point cloud processing and visualization
rasterio — geospatial image loading (satellite data)
mediapipe — pose estimation, face/hand detection (on-device)

AI/ML Notes

Explorer

Computer Vision Roadmap

Computer Vision Roadmap

Topics

Core tasks

Motion and tracking

3D and depth

Human understanding

Techniques

Learning order

The modern workflow

Tutorials and applied

Design judgment

Key libraries

Links

Graph View

Table of Contents

Backlinks