Implementing Naïve Bayes
After calculating by hand the movie preference prediction example, as promised, we are going to code Naïve Bayes from scratch. After that, we will implement it using the scikit-learn
package.
Implementing Naïve Bayes from scratch
Before we develop the model, let's define the toy dataset we just worked with:
>>> import numpy as np
>>> X_train = np.array([
... [0, 1, 1],
... [0, 0, 1],
... [0, 0, 0],
... [1, 1, 0]])
>>> Y_train = ['Y', 'N', 'Y', 'Y']
>>> X_test = np.array([[1, 1, 0]])
For the model, starting with the prior, we first group the data by label and record their indices by classes:
>>> def get_label_indices(labels):
... """
... Group samples based on their labels and return indices
... @param labels: list of labels
... @return: dict, {class1: [indices], class2: [indices]}...