Upgrading the classifier
In order to support the multi-dimensional data, my first step is to refactor the current tests to send their one-dimensional input as a one-dimensional tuple. This will set up our first test quite nicely.
After modifying the tests, you may be surprised when you rerun them to make sure that they break. Here is an example of a test modified that will pass observations as a multi-dimensional input:
def given_classes_of_different_likelihood_test(): classifier = NaiveBayes.Classifier() observation = (3,) observations = { 'class a': [(1,),(2,),(3,),(4,),(5,)], 'class b': [(1,),(1,),(2,),(2,),(3,),(3,),(4,),(4,),(5,),(5,)] } results = classifier._probability_of_each_class_given_data(observation, observations) print results assert results['class b'] > results['class a'], "Should classify as class b when class probability is taken into account."
You can see that every observation we pass into the algorithm is a one-dimensional tuple. What's surprising...