Answer to question 1: There are many ways to solve this problem:
- A ⊕ B= (A ∨ ¬ B)∨ (¬ A ∧ B)
- A ⊕ B = (A ∨ B) ∧ ¬(A ∨ B)
- A ⊕ B = (A ∨ B) ∧ (¬ A ∨ ∧ B), and so on
If we go with the first approach, the resulting ANNs would look like this:
Now from computer science literature, we know that only two input combinations and one output are associated with the XOR operation. With inputs (0, 0) or (1, 1) the network outputs 0; and with inputs (0, 1) or (1, 0), it outputs 1. So we can formally represent the preceding truth table as follows:
X0 |
X1 |
Y |
0 |
0 |
0 |
0 |
1 |
1 |
1 |
0 |
1 |
1 |
1 |
0 |
Here, each pattern is classified into one of two classes that can be separated by a single line L. They are known as linearly separable patterns, as represented here:
Answer to question 2: The most significant progress in ANN and DL can be described in the following timeline. We have already seen how artificial neurons and perceptrons provided the base in 1943s and 1958s respectively. Then, XOR was formulated as a linearly non-separable problem in 1969 by Minsky et al. But later in 1974, Werbos et al. demonstrated the backpropagation algorithm for training the perceptron in 1974.
However, the most significant advancement happened in the 1980s, when John Hopfield et al. proposed the Hopfield Network in 1982. Then, Hinton, one of the godfathers of neural networks and deep learning, and his team proposed the Boltzmann machine in 1985. However, probably one of the most significant advances happened in 1986, when Hinton et al. successfully trained the MLP, and Jordan et. al. proposed RNNs. In the same year, Smolensky et al. also proposed an improved version of the RBM.
In the 1990s, the most significant year was 1997. Lecun et al. proposed LeNet in 1990, and Jordan et al. proposed RNN in 1997. In the same year, Schuster et al. proposed an improved version of LSTM and an improved version of the original RNN, called bidirectional RNN.
Despite significant advances in computing, from 1997 to 2005, we hadn't experienced much advancement, until Hinton struck again in 2006. He and his team proposed a DBN by stacking multiple RBMs. Then in 2012, again Hinton invented dropout, which significantly improved regularization and overfitting in a DNN.
After that, Ian Goodfellow et al. introduced GANs, a significant milestone in image recognition. In 2017, Hinton proposed CapsNets to overcome the limitations of regular CNNs—so far one of the most significant milestones.
Answer to question 3: Yes, you can use other deep learning frameworks described in the Deep learning frameworks section. However, since this book is about using Java for deep learning, I would suggest going for DeepLearning4J. We will see how flexibly we can create networks by stacking input, hidden, and output layers using DeepLearning4J in the next chapter.
Answer to question 4: Yes, you can, since the passenger's name containing a different title (for example, Mr., Mrs., Miss, Master, and so on) could be significant too. For example, we can imagine that being a woman (that is, Mrs.) and being a junior (for example, Master.) could give a higher chance of survival.
Even, after watching the famous movie Titanic (1997), we can imagine that being in a relationship, a girl might have a good chance of survival since his boyfriend would try to save her! Anyway, this is just for imagination, so do not take it seriously. Now, we can write a user-defined function to encode this using Apache Spark. Let's take a look at the following UDF in Java:
private static final UDF1<String, Option<String>> getTitle = (String name) -> {
if(name.contains("Mr.")) { // If it has Mr.
return Some.apply("Mr.");
} else if(name.contains("Mrs.")) { // Or if has Mrs.
return Some.apply("Mrs.");
} else if(name.contains("Miss.")) { // Or if has Miss.
return Some.apply("Miss.");
} else if(name.contains("Master.")) { // Or if has Master.
return Some.apply("Master.");
} else{ // Not any.
return Some.apply("Untitled");
}
};
Next, we can register the UDF. Then I had to register the preceding UDF as follows:
spark.sqlContext().udf().register("getTitle", getTitle, DataTypes.StringType);
Dataset<Row> categoricalDF = df.select(callUDF("getTitle", col("Name")).alias("Name"), col("Sex"),
col("Ticket"), col("Cabin"), col("Embarked"));
categoricalDF.show();
The resulting column would look like this:
Answer to question 5: For many problems, you can start with just one or two hidden layers. This setting will work just fine using two hidden layers with the same total number of neurons (continue reading to get an idea about a number of neurons) in roughly the same amount of training time. Now let's see some naïve estimation about setting the number of hidden layers:
- 0: Only capable of representing linear separable functions
- 1: Can approximate any function that contains a continuous mapping from one finite space to another
- 2: Can represent an arbitrary decision boundary to arbitrary accuracy
However, for a more complex problem, you can gradually ramp up the number of hidden layers, until you start overfitting the training set. Nevertheless, you can try increasing the number of neurons gradually until the network starts overfitting. This means the upper bound on the number of hidden neurons that will not result in overfitting is:
In the preceding equation:
- Ni = number of input neurons
- No = number of output neurons
- Ns = number of samples in training dataset
- α = an arbitrary scaling factor, usually 2-10
Note that the preceding equation does not come from any research but from my personal working experience.
Answer to question 6: Of course, we can. We can cross-validate the training and create a grid search technique for finding the best hyperparameters. Let's give it a try.
First, we have the layers defined. Unfortunately, we cannot cross-validate layers. Probably, it's either a bug or made intentionally by the Spark guys. So we stick to a single layering:
int[] layers = new int[] {10, 16, 16, 2};
Then we create the trainer and set only the layer and seed parameters:
MultilayerPerceptronClassifier mlp = new MultilayerPerceptronClassifier()
.setLayers(layers)
.setSeed(1234L);
We search through the MLP's different hyperparameters for the best model:
ParamMap[] paramGrid = new ParamGridBuilder()
.addGrid(mlp.blockSize(), new int[] {32, 64, 128})
.addGrid(mlp.maxIter(), new int[] {10, 50})
.addGrid(mlp.tol(), new double[] {1E-2, 1E-4, 1E-6})
.build();
MulticlassClassificationEvaluator evaluator = new MulticlassClassificationEvaluator()
.setLabelCol("label")
.setPredictionCol("prediction");
We then set up the cross-validator and perform 10-fold cross-validation:
int numFolds = 10;
CrossValidator crossval = new CrossValidator()
.setEstimator(mlp)
.setEvaluator(evaluator)
.setEstimatorParamMaps(paramGrid)
.setNumFolds(numFolds);
Then we perform training using the cross-validated model:
CrossValidatorModel cvModel = crossval.fit(trainingData);
Finally, we evaluate the cross-validated model on the test set, as follows:
Dataset<Row> predictions = cvModel.transform(validationData);
Now we can compute and show the performance metrics, similar to our previous example:
double accuracy = evaluator1.evaluate(predictions);
double precision = evaluator2.evaluate(predictions);
double recall = evaluator3.evaluate(predictions);
double f1 = evaluator4.evaluate(predictions);
// Print the performance metrics
System.out.println("Accuracy = " + accuracy);
System.out.println("Precision = " + precision);
System.out.println("Recall = " + recall);
System.out.println("F1 = " + f1);
System.out.println("Test Error = " + (1 - accuracy));
>>>Accuracy = 0.7810132575757576
Precision = 0.7810132575757576
Recall = 0.7810132575757576
F1 = 0.7810132575757576
Test Error = 0.21898674242424243