Residual Networks

What

Neural networks with skip connections (shortcut connections) that let input bypass one or more layers. Instead of learning a full mapping H(x), the layers learn a residual F(x) = H(x) - x, and the output is y = F(x) + x.

The problem ResNets solved

Before 2015, deeper networks performed worse than shallower ones — not because of overfitting, but because gradients degraded through many layers. A 56-layer network had higher training error than a 20-layer one. That shouldn’t happen if depth only adds capacity.

Why skip connections work

Gradient highway: gradients flow directly through the skip connection, bypassing layers that might squash them. This solves Vanishing and Exploding Gradients
Easy to learn identity: if a layer isn’t helpful, the weights can go to zero and the block passes input through unchanged. The network can’t get worse by being deeper
Residual is easier to learn: learning a small adjustment to the input is simpler than learning the full transformation from scratch

Input x ──┬──→ [Conv → BN → ReLU → Conv → BN] ──→ (+) ──→ ReLU ──→ Output
           │                                         ↑
           └─────────── skip connection ─────────────┘

ResNet architecture

Variant	Layers	Parameters	Top-1 accuracy (ImageNet)
ResNet-18	18	11M	~69%
ResNet-34	34	21M	~73%
ResNet-50	50	25M	~76%
ResNet-101	101	44M	~77%
ResNet-152	152	60M	~78%

Bottleneck blocks

ResNet-50+ uses bottleneck blocks to reduce computation:

1x1 conv (reduce channels: 256 → 64)
3x3 conv (process at reduced dimension)
1x1 conv (expand channels: 64 → 256)

The 1x1 convolutions squeeze and expand the channel dimension, so the expensive 3x3 convolution works on fewer channels. This makes deeper networks practical.

Impact

Enabled training networks with 100+ layers (up to 1000+ in experiments)
Won ImageNet 2015 by a large margin
Skip connections became standard in almost every modern architecture: DenseNet, U-Net, Transformers (residual connections around attention and FFN layers)
Transfer Learning with pretrained ResNets is one of the most common starting points for vision tasks

AI/ML Notes

Explorer

Residual Networks

Residual Networks

What

The problem ResNets solved

Why skip connections work

ResNet architecture

Bottleneck blocks

Impact

Links

Graph View

Table of Contents

Backlinks