NLP Roadmap
Natural Language Processing — teaching machines to understand and generate human language.
Topics
Foundations
- Text Preprocessing — tokenization, cleaning, normalization
- Bag of Words and TF-IDF — classical text representations
- Embeddings — dense vector representations of words
Modern NLP
- Language Models — from n-grams to GPT
- BERT and Masked Language Models — understanding text
- Text Classification — sentiment, spam, topic detection
- Named Entity Recognition — extracting entities from text
- Text Generation — autoregressive models, sampling strategies
Advanced
- Retrieval Augmented Generation — grounding LLMs with external knowledge
- Prompt Engineering — getting the most from large language models
- Fine-Tuning LLMs — adapting pretrained models to specific tasks
The evolution
Rules → Bag of Words → Word2Vec → RNNs → Transformers → LLMs
↑
we are here
In practice, modern NLP = pretrained transformers + fine-tuning or prompting.
Links
- Transformers — the architecture
- Attention Mechanism — the core innovation
- Deep Learning Roadmap