CNNs use convolutions to learn spatial patterns from images.
Convolution: Slide a filter across the image, computing dot products. Output is a feature map.
Key properties:
- Parameter sharing: Same filter everywhere
- Translation equivariance: Pattern detection works regardless of position
- Local connectivity: Each output depends on small input region
Pooling: Downsample feature maps. Max pooling is most common.
Interview question: "Why convolutions over fully-connected?"
Far fewer parameters. A filter has weights regardless of image size.