This is the third article in a series on building a multilayer perceptron (MLP) in Python from scratch. The previous articles can be found here:
- Multilayer perceptron - Building one from scratch in Python
- Softmax and Cross-entropy functions for MLP networks
In this article we are replacing the initial implementation of backpropagation, which was carried out by methods of symbolic differentiation, with a version of backpropagation implemented using Autograd package.
In the previous article, when we wanted to calculate the partial derivatives for every weight in the MLP network, we used methods of symbolic differentiation to obtain closed form expressions. Although that approach provides correct results, finding the closed form solutions for every layer in the network is a tedious process prone to errors. Any change in the network, such as adding or removing nodes or layers, or changing the activation function requires manual update of all these expressions.
In order to streamline this process, machine learning community came up with a set of techniques known as automatic differentiation. Automatic differentiation can be viewed as a combination of numerical and symbolic differentiation that facilitates the process of finding function derivatives. In our case, it enables us to perform backpropagation, or to compute the values of partial derivatives of loss function with respect to every node in the network, in just two lines of Python code:
# compute the gradient-returning function:
compute_gradients = grad(self.predict)
# use it to compute weight gradients:
self.weights_gradients = compute_gradients(self.weights)
Autograd package is the most popular automatic differentiation package for Python. Its use in Python programs is fairly easy. All that's need is to make sure that our forward pass function (in our project that's the predict() function) is written in such way that it accepts a single parameter, a matrix of network weights, and returns a single scalar value - the loss value.
Once this is done, all we need is to call Autograd's function predict() which returns another function, which we arbitrarily named compute_gradients(), that returns a matrix of partial derivatives as its output. Once we obtain this matrix, we can use it to perform gradient descent exactly as in the previous articles.
Because of these requirements, we rewrote the predict() function so it now replaces the following functions:
- ff_compute_hidden_layer()
- ff_compute_output_layer()
- bp_compute_output_layer_gradients()
- bp_compute_hidden_layer_gradients()
All this results in a smaller codebase that's easier to experiment with. It can be said that automatic differentiation makes the process of implementing backpropagation much easier when compared to methods of symbolic differentiation. That's the core reason why Autograd package is widely used in many popular ML packages for Python.
Although automatic differentiation is very powerful and useful, it is always good to be aware of general rules that it adheres to, and these are essentially the rules of symbolic differentiation, which we already used in previous versions of this series.
If you would like to know more about automatic differentiation, you can take a look at this video lecture from one of the authors of autograd package.
The source code for this third article in the MLP series is on GitHub