The BernoulliRBM
The only scikit-learn implemented version of a Restricted Boltzmann Machine is called BernoulliRBM because it imposes a constraint on the type of probability distribution it can learn. The Bernoulli distribution allows for data values to be between zero and one. The scikit-learn documentation states that the model assumes the inputs are either binary values or values between zero and one. This is done to represent the fact that the node values represent a probability that the node is activated or not. It allows for quicker learning of feature sets. To account for this, we will alter our dataset to account for only hardcoded white/black pixel intensities. By doing so, every cell value will either be zero or one (white or black) to make learning more robust. We will accomplish this in two steps:
- We will scale the values of the pixels to be between zero and one
- We will change the pixel values in place to be true if the value is over
0.5
, and false otherwise
Let's start by scaling...