Continuous Bag of Words model
The design of the neural network to predict a word given its surrounding context is shown in the following figure:

The input layer receives the context while the output layer predicts the target word. The model we'll use for the CBOW model has three layers: input layer, hidden layer (also called the projection layer or embedding layer), and output layer. In our setting, the vocabulary size is V and the hidden layer size is N. Adjacent units are fully connected.
The input and the output can be represented either by an index (an integer, 0-dimensional) or a one-hot-encoding vector (1-dimensional). Multiplying with the one-hot-encoding vector v
consists simply of taking the j-th row of the embedding matrix:

Since the index representation is more efficient than the one-hot encoding representation in terms of memory usage, and Theano supports indexing symbolic variables, it is preferable to adopt the index representation as much as possible.
Therefore, input (context...