# Adding automatic differentiation to a multilayer perceptron in Python

This is the third article in a series on building a multilayer perceptron (MLP) in Python from scratch. The previous articles can be found here:

In this article we are replacing the initial implementation of backpropagation, which was carried out by methods of symbolic differentiation, with a version of backpropagation implemented using Autograd package.

In the previous article, when we wanted to calculate the partial derivatives for every weight in the MLP network, we used methods of symbolic differentiation to obtain closed form expressions. Although that approach provides correct results, finding the closed form solutions for every layer in the network is a tedious process prone to errors. Any change in the network, such as adding or removing nodes or layers, or changing the activation function requires manual update of all these expressions.

In order to streamline this process, machine learning community came up with a set of techniques known as automatic differentiation. Automatic differentiation can be viewed as a combination of numerical and symbolic differentiation that facilitates the process of finding function derivatives. In our case, it enables us to perform backpropagation, or to compute the values of partial derivatives of loss function with respect to every node in the network, in just two lines of Python code:

# compute the gradient-returning function:
# use it to compute weight gradients:
self.weights_gradients =  compute_gradients(self.weights)

Autograd package is the most popular automatic differentiation package for Python. Its use in Python programs is fairly easy. All that's need is to make sure that our forward pass function (in our project that's the predict() function) is written in such way that it accepts a single parameter, a matrix of network weights, and returns a single scalar value - the loss value.

Once this is done, all we need is to call Autograd's function predict() which returns another function, which we arbitrarily named compute_gradients(), that returns a matrix of partial derivatives as its output. Once we obtain this matrix, we can use it to perform gradient descent exactly as in the previous articles.

Because of these requirements, we rewrote the predict() function so it now replaces the following functions:

• ff_compute_hidden_layer()
• ff_compute_output_layer()