Glossary

Quick reference for terms you’ll encounter constantly.

A

B

  • Backpropagation: computing gradients via chain rule → Backpropagation
  • Batch normalization: normalize activations within a batch → Batch Normalization
  • Bias (model): error from oversimplified assumptions → Bias-Variance Tradeoff
  • Bias (parameter): the b in wx + b, an offset term
  • BPE: byte-pair encoding, subword tokenization → Tokenization

C

D

E

  • Embedding: dense vector representation of discrete data → Embeddings
  • Epoch: one full pass through the training data
  • Ensemble: combining multiple models (bagging, boosting, stacking) → Ensemble Methods

F

  • F1 score: harmonic mean of precision and recall → Evaluation Metrics
  • Feature: an input variable to the model
  • Feature store: centralized repository for ML features → Feature Stores
  • Fine-tuning: adapting a pretrained model → Transfer Learning

G

H

I

  • Inference: using a trained model to make predictions

K

  • KV cache: stored key-value pairs for efficient transformer inference

L

  • Learning rate: step size in gradient descent
  • LLM: large language model → Language Models
  • Loss function: measures prediction error → Loss Functions
  • LoRA: low-rank adaptation for efficient fine-tuning → LoRA and PEFT

M

  • MLP: multi-layer perceptron (feedforward neural net)
  • MSE: mean squared error → Loss Functions

O

P

R

S

  • Scaling laws: power-law relationships between compute/data/performance → Scaling Laws
  • Softmax: convert logits to probabilities summing to 1
  • SGD: stochastic gradient descent → Gradient Descent
  • SVD: singular value decomposition → Matrix Decomposition

T

  • Tensor: n-dimensional array (generalization of matrices)
  • Tokenization: splitting text into units → Text Preprocessing
  • Transformer: the dominant architecture → Transformers
  • Transfer learning: reuse pretrained models → Transfer Learning

U

V

W