As we said before, neural network models consist of a number sequential layers, which is why Keras has a class called Sequential that we can use to instantiate a neural network model:
from keras.models import Sequential
nn_reg = Sequential()
Good! We have created an empty neural network called nn_reg. Now, we have to add layers to it. We will use what is known as fully connected or dense layers—these are layers made of neurons that are connected with all neurons from the previous layer. In other words, every neuron in a dense layer receives the output of all the neurons from the previous layer. Our MLP will be made of dense layers. Let's import the Dense class:
from keras.layers import Dense
As we discussed in our conceptual section, the first layer in an MLP is always the input layer, the one that will receive the data of the features and pass it to the first hidden layer. However, in Keras, there is no need to create an input layer, because this layer is basically the features. So, you will not explicitly see the input layer in the code, but conceptually it is there. With that point clear, the first layer we will add to our empty neural network is the first hidden layer; this is a special layer, because we need to specify (as a tuple) the shape of the input; from the Keras documentation, we read that the first layer in a sequential model (and only the first, because the following layers can do automatic shape inference) needs to receive information about its input shape. Now, we will add this first layer:
n_input = X_train.shape[1]
n_hidden1 = 32
# adding first hidden layer
nn_reg.add(Dense(units=n_hidden1, activation='relu', input_shape=(n_input,)))
Let's understand each of the parameters:
- units: This is the number of neurons in the layer; we are using 32
- activation: This is the activation function that will be used in each of the neurons; we are using ReLU
- input_shape: This is the number of inputs the network will receive, which is equal to the number of predictive features in our dataset; we don't need to specify how many samples the network will receive, as it can deal with any number of them
Our neural network now has one hidden layer; since this is a relatively simple problem and we have a relatively small dataset, we will add only two more hidden layers, for a total of three. Few people would call this a deep learning model, since we have only three layers, but the process of building and training is essentially the same with three or with 300 hidden layers. This is our first neural network model, so I consider it to be a good start. Let's add the next two hidden layers:
n_hidden2 = 16
n_hidden3 = 8
# add second hidden layer
nn_reg.add(Dense(units=n_hidden2, activation='relu'))
# add third hidden layer
nn_reg.add(Dense(units=n_hidden3, activation='relu'))
Notice the number of units we are using in our successive layers—32, 16, and 8. First, we are using powers of 2, which is a common practice in the field, and second we are shaping our network as a funnel—we are going from 32 units to 8 units; there is nothing special about this shape but, empirically, sometimes it works very well. Another common approach would be to use the same number of neurons in each hidden layer.
To finish our neural network, we need to add the final layer—the output layer. Since this is a regression problem for each sample, we want only one output—the prediction for the price, so we need to add a layer that connects the 8 outputs from the previews layer to one output that will give us the price prediction. In this last layer, there is no need for an activation function, since we are getting the final prediction:
# output layer
nn_reg.add(Dense(units=1, activation=None))
Great! Our model architecture has been defined—our neural network, just like the other models we built before, is a function that will take the values of 21 features and will produce one number as output—the predicted price.
Our neural network has been built. In fact, if you feed it data, you will get price predictions; here, you have the predictions for the first 5 diamonds in the training set:
nn_reg.predict(X_train.iloc[:5,:])
The output will be as follows:
These are the price predictions, and they are, of course, very bad predictions. Why is this? Because every neuron in our network has randomly initialized weights, and biases are all initialized as 1's. Keras, by default, uses an initialization procedure called Glorot uniform initializer, also called Xavier uniform initializer (Glorot & Bengio, 2010), which is one of the most popular ways of initializing neural networks and that has been proven very useful in practice. There are other initialization schemes, but a discussion about those is outside our scope. We are going to trust the good and smart developers of Keras and use their defaults.
Now, it is time to start modifying these random weights and biases, little by little, using our training data, and now we will enter the training loop.