[box type="note" align="" class="" width=""]This article is an excerpt from a book written by Fabio M. Soares and Alan M. F. Souza, titled Neural Network Programming with Java Second Edition. This book covers the current state-of-art in the field of neural network that helps you understand and design basic to advanced neural networks with Java.[/box]
Our article explores the power of neural networks in pattern recognition by showcasing how to recognize digits from 0 to 9 in an image.
For pattern recognition, the neural network architectures that can be applied are MLPs (supervised) and the Kohonen Network (unsupervised). In the first case, the problem should be set up as a classification problem, that is, the data should be transformed into the X-Y dataset, where for every data record in X there should be a corresponding class in Y. The output of the neural network for classification problems should have all of the possible classes, and this may require preprocessing of the output records.
For the other case, unsupervised learning, there is no need to apply labels to the output, but the input data should be properly structured. To remind you, the schema of both neural networks are shown in the next figure:
We have to deal with all possible types of data, i.e., numerical (continuous and discrete) and categorical (ordinal or unscaled).
However, here we have the possibility of performing pattern recognition on multimedia content, such as images and videos. So, can multimedia could be handled? The answer to this question lies in the way these contents are stored in files. Images, for example, are written with a representation of small colored points called pixels. Each color can be coded in an RGB notation where the intensity of red, green, and blue define every color the human eye is able to see. Therefore an image of dimension 100x100 would have 10,000 pixels, each one having three values for red, green and blue, yielding a total of 30,000 points. That is the challenge for image processing in neural networks.
Some methods, may reduce this huge number of dimensions. Afterwards an image can be treated as big matrix of numerical continuous values.
For simplicity, we are applying only gray-scale images with small dimensions in this article.
Many documents are now being scanned and stored as images, making it necessary to convert these documents back into text, for a computer to apply edition and text processing. However, this feature involves a number of challenges:
In spite of that, humans can easily interpret and read even texts produced in a bad quality image. This can be explained by the fact that humans are already familiar with text characters and the words in their language. Somehow the algorithm must become acquainted with these elements (characters, digits, signalization, and so on), in order to successfully recognize texts in images.
Although there are a variety of tools available on the market for OCR, it still remains a big challenge for an algorithm to properly recognize texts in images. So, we will be restricting our application to in a smaller domain, so that we'll face simpler problems. Therefore, in this article, we are going to implement a neural network to recognize digits from 0 to 9 represented on images. Also, the images will have standardized and small dimensions, for the sake of simplicity.
We applied the standard dimension of 10x10 (100 pixels) in gray scaled images, resulting in 100 values of gray scale for each image:
In the preceding image we have a sketch representing the digit 3 at the left and a corresponding matrix with gray values for the same digit, in gray scale.
We apply this pre-processing in order to represent all ten digits in this application.
To recognize optical characters, data to train and to test neural network was produced by us. In this example, digits from 0 (super black) to 255 (super white) were considered. According to pixel disposal, two versions of each digit data were created: one to train and another to test. Classification techniques will be used here.
Numbers from zero to nine were drawn in the Microsoft Paint ®. The images have been converted into matrices, from which some examples are shown in the following image. All pixel values between zero and nine are grayscale:
For each digit we generated five variations, where one is the perfect digit, and the others contain noise, either by the drawing, or by the image quality.
Each matrix row was merged into vectors (Dtrain and Dtest) to form a pattern that will be used to train and test the neural network. Therefore, the input layer of the neural network will be composed of 101 neurons.
The output dataset was represented by ten patterns. Each one has a more expressive value (one) and the rest of the values are zero. Therefore, the output layer of the neural network will have ten neurons.
So, in this application our neural network will have 100 inputs (for images that have a 10x10 pixel size) and ten outputs, the number of hidden neurons remaining unrestricted. We created a class called DigitExample to handle this application. The neural network architecture was chosen with these parameters:
Now, as has been done in other cases previously presented, let's find the best neural network topology training several nets. The strategy to do that is summarized in the following table:
Experiment | Learning rate | Activation Functions |
#1 | 0.3 | Hidden Layer: SIGLOG |
Output Layer: LINEAR | ||
#2 | 0.5 | Hidden Layer: SIGLOG |
Output Layer: LINEAR | ||
#3 | 0.8 | Hidden Layer: SIGLOG |
Output Layer: LINEAR | ||
#4 | 0.3 | Hidden Layer: HYPERTAN |
Output Layer: LINEAR | ||
#5 | 0.5 | Hidden Layer: SIGLOG |
Output Layer: LINEAR | ||
#6 | 0.8 | Hidden Layer: SIGLOG |
Output Layer: LINEAR | ||
#7 | 0.3 | Hidden Layer: HYPERTAN |
Output Layer: SIGLOG | ||
#8 | 0.5 | Hidden Layer: HYPERTAN |
Output Layer: SIGLOG | ||
#9 | 0.8 | Hidden Layer: HYPERTAN |
Output Layer: SIGLOG |
The following DigitExample class code defines how to create a neural network to read from digit data:
// enter neural net parameter via keyboard (omitted)
// load dataset from external file (omitted)
// data normalization (omitted)
// create ANN and define parameters to TRAIN:
Backpropagation backprop = new Backpropagation(nn, neuralDataSetToTrain, LearningAlgorithm.LearningMode.BATCH);
backprop.setLearningRate( typedLearningRate );
backprop.setMaxEpochs( typedEpochs );
backprop.setGeneralErrorMeasurement(Backpropagation.ErrorMeasurement.SimpleError);
backprop.setOverallErrorMeasurement(Backpropagation.ErrorMeasurement.MSE);
backprop.setMinOverallError(0.001);
backprop.setMomentumRate(0.7);
backprop.setTestingDataSet(neuralDataSetToTest);
backprop.printTraining = true;
backprop.showPlotError = true;
// train ANN:
try {
backprop.forward();
//neuralDataSetToTrain.printNeuralOutput();
backprop.train();
System.out.println("End of training");
if (backprop.getMinOverallError() >= backprop.getOverallGeneralError()) {
System.out.println("Training successful!");
} else {
System.out.println("Training was unsuccessful");
}
System.out.println("Overall Error:" + String.valueOf(backprop.getOverallGeneralError()));
System.out.println("Min Overall Error:" + String.valueOf(backprop.getMinOverallError()));
System.out.println("Epochs of training:" + String.valueOf(backprop.getEpoch()));
} catch (NeuralException ne) {
ne.printStackTrace();
}
// test ANN (omitted)
After running each experiment using the DigitExample class, excluding training and testing overall errors and the quantity of right number classifications using the test data (table above), it is possible observe that experiments #2 and #4 have the lowest MSE values. The differences between these two experiments are learning rate and activation function used in the output layer.
Experiment | Training overall error | Testing overall error | # Right number classifications |
#1 | 9.99918E-4 | 0.01221 | 2 by 10 |
#2 | 9.99384E-4 | 0.00140 | 5 by 10 |
#3 | 9.85974E-4 | 0.00621 | 4 by 10 |
#4 | 9.83387E-4 | 0.02491 | 3 by 10 |
#5 | 9.99349E-4 | 0.00382 | 3 by 10 |
#6 | 273.70 | 319.74 | 2 by 10 |
#7 | 1.32070 | 6.35136 | 5 by 10 |
#8 | 1.24012 | 4.87290 | 7 by 10 |
#9 | 1.51045 | 4.35602 | 3 by 10 |
The figure above shows the MSE evolution (train and test) by each epoch graphically by experiment #2. It is interesting to notice the curve stabilizes near the 30th epoch:
The same graphic analysis was performed for experiment #8. It is possible to check the MSE curve stabilizes near the 200th epoch.
As already explained, only MSE values might not be considered to attest neural net quality. Accordingly, the test dataset has verified the neural network generalization capacity. The next table shows the comparison between real output with noise and the neural net estimated output of experiment #2 and #8. It is possible to conclude that the neural network weights by experiment #8 can recognize seven digits patterns better than #2's:
Output comparison | |
Real output (test dataset) | Digit |
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 |
0
1 2 3 4 5 6 7 8 9 |
Estimated output (test dataset) – Experiment #2 | Digit |
0.20 0.26 0.09 -0.09 0.39 0.24 0.35 0.30 0.24 1.02
0.42 -0.23 0.39 0.06 0.11 0.16 0.43 0.25 0.17 -0.26 0.51 0.84 -0.17 0.02 0.16 0.27 -0.15 0.14 -0.34 -0.12 -0.20 -0.05 -0.58 0.20 -0.16 0.27 0.83 -0.56 0.42 0.35 0.24 0.05 0.72 -0.05 -0.25 -0.38 -0.33 0.66 0.05 -0.63 0.08 0.41 -0.21 0.41 0.59 -0.12 -0.54 0.27 0.38 0.00 -0.76 -0.35 -0.09 1.25 -0.78 0.55 -0.22 0.61 0.51 0.27 -0.15 0.11 0.54 -0.53 0.55 0.17 0.09 -0.72 0.03 0.12 0.03 0.41 0.49 -0.44 -0.01 0.05 -0.05 -0.03 -0.32 -0.30 0.63 -0.47 -0.15 0.17 0.38 -0.24 0.58 0.07 -0.16 0.54 |
0 (OK)
1 (ERR) 2 (ERR) 3 (OK) 4 (ERR) 5 (OK) 6 (OK) 7 (ERR) 8 (ERR) 9 (OK) |
Estimated output (test dataset) – Experiment #8 | Digit |
0.10 0.10 0.12 0.10 0.12 0.13 0.13 0.26 0.17 0.39
0.13 0.10 0.11 0.10 0.11 0.10 0.29 0.23 0.32 0.10 0.26 0.38 0.10 0.10 0.12 0.10 0.10 0.17 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.17 0.39 0.10 0.38 0.10 0.15 0.10 0.24 0.10 0.10 0.10 0.10 0.39 0.37 0.10 0.20 0.12 0.10 0.10 0.37 0.10 0.10 0.10 0.17 0.12 0.10 0.10 0.10 0.39 0.10 0.16 0.11 0.30 0.14 0.10 0.10 0.11 0.39 0.10 0.10 0.15 0.10 0.10 0.17 0.10 0.10 0.25 0.34 0.10 0.10 0.10 0.10 0.10 0.10 0.10 0.39 0.10 0.10 0.10 0.28 0.10 0.27 0.11 0.10 0.21 |
0 (OK)
1 (OK) 2 (OK) 3 (ERR) 4 (OK) 5 (ERR) 6 (OK) 7 (OK) 8 (ERR) 9 (OK) |
The experiments showed in this article have taken in consideration 10x10 pixel information images. We recommend that you try to use 20x20 pixel datasets to build a neural net able to classify digit images of this size.
You should also change the training parameters of the neural net to achieve better classifications.
To summarize, we applied neural network techniques to perform pattern recognition on a series of numbers from 0 to 9 in an image. The application here can be extended to any type of characters instead of digits, under the condition that the neural network should all be presented with the predefined characters.
If you enjoyed this excerpt, check out the book Neural Network Programming with Java Second Edition to know more about leveraging the multi-platform feature of Java to build and run your personal neural networks everywhere.