Generating word embeddings using a Skip-gram based neural network in Python

Author: Goran Trlin

In this code example, we are generating some word (data) embeddings using a tiny neural network (NN) based on the well-known Skip-gram algortihm, used in Word2Vec package. The code of this example is written in Python from scratch, using only basic libraries such as numpy. The NN network works with a small vocabulary of only 17 words, the same one we used in the basic RNN example. Skip-gram's window size parameter is set to 1.

Full neural network structure:

  • 1 x 17 input neurons
  • 1 x 8 hidden layer neurons ( embedding size = 8 bits )
  • 1 x 17 output layer neurons with categorical softmax in the output layer

Key notes:

  • Generally, Skip-gram is considered a semi-supervised learning algorithm, since we are not intrested in the outputs but in data embeddings, extracted from learned weights between input and hidden layer .
  • Automatic differentiation (autograd package) is used for generating gradients (backpropagation) here
  • Basic text input file is very small but still large enough for the purpose of this demonstration.
  • After the NN is trained, word embeddings are extracted from the weights between every input neuron and every neuron in the hidden layer. This means that the resulting embeddings are 8 bit vectors, and that there exist 17 of them.
  • It is important to notice that the words that appear in the same contexts many times end up having high cosine similarity, while the words that are not appearing in the same contexts have small values of cosine similarity.

The code for this project can be found on GitHub.

Related links