Naive Bayes with scikit-learn
Let's fit a Naive Bayes classifier with scikit-learn. We will compare the performances of Naive Bayes and logistic regression classifiers on increasingly large samples of two different training sets. The Breast Cancer Wisconsin dataset consists of features extracted from fine needle aspirate images of breast masses. The task is to classify masses as malignant or benign using 30 real-valued features that describe the cell nuclei in each fine needle aspirate image. The dataset has 212 malignant instances and 357 benign instances. The Pima Indians Diabetes Database task is to predict whether an individual has diabetes using eight features representing the number of times the individual has been pregnant, measures from an oral glucose tolerance test, diastolic blood pressure, triceps skin fold thickness, body mass index, age, and other diagnostics. The dataset has 268 diabetic instances and 500 non-diabetic instances:
# In[1]: %matplotlib inline # In[2]: import...