A neural network is layers of linear transformations with non-linear activations.
Forward pass: for each layer. Input flows through, producing output.
Parameters: Weights W and biases b. Learned during training.
Depth vs width: Deeper networks learn hierarchical features. Wider networks have more capacity per layer.
Universal approximation: A single hidden layer can approximate any function. But deeper networks are more parameter-efficient.
Interview question: "Why do we need non-linearity?"
Without it, stacking layers is just matrix multiplication. The network collapses to linear regression.