Backpropagation
Backpropagation is a supervised learning algorithm that computes the gradient of a loss function with respect to network weights by applying the chain rule through the network’s layers.
Backpropagation, often shortened to backprop, is a fundamental algorithm for training artificial neural networks. It was popularized in a 1986 paper by Rumelhart, Hinton, and Williams, though earlier formulations existed. The algorithm enables efficient computation of the gradient of the loss function with respect to each weight in the network, which is then used by an optimization method like stochastic gradient descent to update the weights and minimize the loss.
The process consists of two main phases: a forward pass and a backward pass. During the forward pass, input data is propagated through the network’s layers to produce an output. The loss function then measures the error between this output and the true target. In the backward pass, the algorithm calculates the gradient of the loss with respect to each weight by applying the chain rule of calculus, propagating error signals backward from the output layer to the input layer. This backward flow of gradients allows each weight to be adjusted in proportion to its contribution to the overall error.
Backpropagation is a key enabler of deep learning, as it scales efficiently to networks with many layers and millions of parameters. Its computational cost is proportional to the number of weights, making it feasible to train large models. However, it can suffer from issues like vanishing or exploding gradients in very deep networks, which has motivated architectural innovations such as rectified linear units, batch normalization, and residual connections.
Why it matters
Backpropagation is the standard method for training virtually all modern neural networks, from image classifiers to language models. Without it, learning the millions or billions of parameters in deep networks would be computationally intractable. Its efficiency and mathematical soundness have made it a cornerstone of deep learning, enabling breakthroughs in computer vision, natural language processing, and reinforcement learning.
First appeared
Rumelhart, Hinton, Williams, 1986.
Related terms
FAQ
How does it work?
Backpropagation works by first computing the network’s output and the loss during a forward pass. Then, in a backward pass, it applies the chain rule to compute the gradient of the loss with respect to each weight, layer by layer from output to input. These gradients are then used by an optimizer to update the weights and reduce the loss.
What is the difference between forward propagation and backpropagation?
Forward propagation passes input data through the network to produce an output and compute the loss. Backpropagation reverses this direction, propagating error gradients backward to compute how each weight contributed to the loss. Forward propagation is used for inference, while backpropagation is used for training.
Why can backpropagation fail in very deep networks?
In very deep networks, repeated multiplication of gradients during backpropagation can cause them to vanish (become extremely small) or explode (become very large). Vanishing gradients prevent early layers from learning, while exploding gradients can destabilize training. Techniques like careful weight initialization, batch normalization, and skip connections help mitigate these issues.