After calculating the movie preference example by hand, as promised, we are going to implement Naïve Bayes from scratch. After that, we will implement it using the scikit-learn
package.
Implementing Naïve Bayes from scratch
Before we develop the model, let’s define the toy dataset we just worked with:
>>> import numpy as np
>>> X_train = np.array([
... [0, 1, 1],
... [0, 0, 1],
... [0, 0, 0],
... [1, 1, 0]])
>>> Y_train = ['Y', 'N', 'Y', 'Y']
>>> X_test = np.array([[1, 1, 0]])
For the model, starting with the prior, we first group the data by label and record their indices by classes:
>>> def get_label_indices(labels):
... """
... Group samples based on their labels and return indices
... @param labels: list of labels
... @return: dict, {class1: [indices], class2: [indices]}
... ...