There is a simpler way of coding what we coded in the previous section using Keras. We can rely on the fact that the backprop is coded correctly and is improved for stability and there is a richer set of other features and algorithms that can improve the learning process. Before we begin the process of optimizing the set of hyperparameters of the MLP, we should indicate what would be the equivalent implementation using Keras. The following code should reproduce the same model, almost the same loss function, and almost the same backprop methodology:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
mlp = Sequential()
mlp.add(Dense(3, input_dim=2, activation='sigmoid'))
mlp.add(Dense(2, activation='sigmoid'))
mlp.compile(loss='mean_squared_error',
optimizer='sgd',
metrics=['accuracy'])
# This assumes that you still have X, y from earlier
# when we called...