Training Projects

Structured training exercises on real datasets. Each project includes a dataset, clear goal, and expected techniques.

Beginner — Classical ML

1. House Price Prediction

  • Dataset: California Housing (sklearn built-in) or Ames Housing
  • Goal: predict house prices
  • Techniques: linear regression, feature engineering, Ridge/Lasso
  • Code: ../projects/01_house_prices/

2. Spam Classifier

  • Dataset: SMS Spam Collection (UCI)
  • Goal: classify messages as spam/ham
  • Techniques: TF-IDF, Naive Bayes, logistic regression
  • Code: ../projects/02_spam_classifier/

3. Customer Segmentation

  • Dataset: Mall Customers (Kaggle)
  • Goal: find customer groups
  • Techniques: K-means, PCA, visualization
  • Code: ../projects/03_customer_segmentation/

Intermediate — Deep Learning

4. MNIST Digit Classifier

  • Dataset: MNIST (torchvision built-in)
  • Goal: classify handwritten digits, >99% accuracy
  • Techniques: feedforward net → CNN, training loop, evaluation
  • Code: ../projects/04_mnist/

5. CIFAR-10 with Transfer Learning

  • Dataset: CIFAR-10 (torchvision built-in)
  • Goal: classify 10 object categories
  • Techniques: pretrained ResNet, fine-tuning, data augmentation
  • Code: ../projects/05_cifar10_transfer/

6. Sentiment Analysis Pipeline

  • Dataset: IMDB Reviews (Hugging Face datasets)
  • Goal: positive/negative review classification
  • Techniques: TF-IDF baseline → fine-tuned DistilBERT
  • Code: ../projects/06_sentiment/

7. Time Series Forecasting

  • Dataset: Jena Climate (Keras datasets) or Air Passengers
  • Goal: predict temperature/passenger count
  • Techniques: feature engineering, LSTM, transformer
  • Code: ../projects/07_time_series/

Advanced — Modern AI

8. Build a RAG System

  • Dataset: Wikipedia subset or your own documents
  • Goal: question-answering over custom knowledge base
  • Techniques: embeddings, FAISS, retrieval + LLM generation
  • Code: ../projects/08_rag_system/

9. Fine-Tune a Small LLM

  • Dataset: Alpaca or custom instruction pairs
  • Goal: instruction-following model from a base model
  • Techniques: QLoRA, training loop, evaluation
  • Code: ../projects/09_finetune_llm/

10. Object Detection

  • Dataset: COCO subset or Pascal VOC
  • Goal: detect and localize objects in images
  • Techniques: YOLOv8, fine-tuning, mAP evaluation
  • Code: ../projects/10_object_detection/

11. RL Agent

  • Dataset: Gymnasium environments (CartPole → LunarLander)
  • Goal: train an agent to solve progressively harder environments
  • Techniques: Q-learning → DQN → PPO
  • Code: ../projects/11_rl_agent/