Softmax and cross-entropy functions for multilayer perceptron networks

Author: Goran Trlin

This article is a continuation of the previous article where we created a multilayer perceptron (MLP) from scratch. The source code of all articles can be found on GitHub.

The full series of articles on building MLPs in Python from scratch:

1. Multilayer perceptron tutorial - building one from scratch in Python
The first tutorial uses no advanced concepts and relies on two small neural networks, one for circles and one for lines.

2. Softmax and Cross-entropy functions for multilayer perceptron networks
The second tutorial fuses the two neural networks into one and adds the notions of Softmax output and Cross-entropy loss.

3. Adding automatic differentiation to the multilayer perceptron
The third tutorial introduces automatic differentiation which immensely helps with gradient calculations.

Adding Softmax and Cross-entropy

In this article we are introducing two new features to the existing MLP program:

  • The softmax function at the output layer
  • The cross-entropy function as the loss function

Thanks to these two functions, we are able to use only a single neural network insted of two, like in the first article. This time, the output layer of our network contains two outputs, forming an output vector.

The final program asks for an image of a hand-drawn circle or line. If a circle image is provided, the output vector will look like this:

\( [0.999, 0.001] )\

Conversely, if a line is provided, the program output should look something like this:

\( [0.001, 0.999] )\

The elements of the output vector can be seen as the estimated probabilities. These probabilities are generated by applying the softmax function to the output layer values.

The softmax formula

$$ softmax(i) = \frac{e^{\boldsymbol{z}_{i}}} {\sum_{j=1}^{K}e^{\boldsymbol{z}_{j}}} $$

In the formula above, included symbols have the following meaning: \( \boldsymbol{K} )\ - total number of output neurons \( \boldsymbol{z} )\ - vector of all output neuron values \( \boldsymbol{z} )\ - vector of all output neuron values \( \boldsymbol{z}_{i} )\ - ith value in \( \boldsymbol{z} )\ vector

It is important to note that all K softmax outputs sum to the value of 1: $$ \sum_{i=1}^{K}{softmax(i)} = 1 $$ This fact enables us to interpret the distribution of these summands as a probability distribution. Higher the value of a softmax(i) output, higher the probability that it is the correct output for the test sample.

The cross-entropy loss function

$$ H(p,q) = -\sum_{i}^{outputs}p({o}_i)log(q(o_{i})) $$

For many applications of cross-entropy, and including ours, the base of log function can be set to \( e )\. This doesn't change the formula validity and it makes calculations easier. For example, the derivation of \( ln(e) )\ is somewhat simpler than the one of \( log )\ function with other bases.

By doing so, the above formula becomes:

$$ H(p,q) = -\sum_{i}^{outputs}p({o}_i)ln(q(o_{i})) $$

In the formula above, included symbols have the following meaning: \( p({o}_i) )\ - real probability distribution vector. Example: \( [1, 0] )\ if the learning example is a circle, and \( [0, 1] )\ if the learning example is a line. \( q({o}_i) )\ - estimated probability distribution vector. Example: \( [1, 0] )\ if the estimate (prediction) is a circle, and \( [0,1] )\ if the prediction is a line.

Training data

Input features remain in the same format as in the previous article:


Instead of having 2 input files with correct answers, in this tutorial we use only one learning file with the correct answers. The file with correct answers now looks like this:


p_circle and p_line represent the true probability distributions for the training examples.

Testing and output

As in the previous tutorial, you can test the predictions with one of the following test image names:

  • tc-1.png (circle)
  • tc-2.png (circle)
  • tc-3.png (circle)
  • tc-4.png (circle)
  • tc-5.png (circle)
  • tl-1.png (line)
  • tl-2.png (line)
  • tl-3.png (line)
  • tl-4.png (line)
  • tl-5.png (line)

The program will display both the estimated (predicted) probabilities for the image being a circle or line and also a correct label - CIRCLE or LINE.

Related subjects

The following subjects are closely related to the methods presented in this article, and learning about them can be useful when playing with the softmax, cross-entropy and MLPs.