Exercises
In this exercise, we will use the DBWorld e-mails dataset from the UCI Machine Learning repository to compare the relative performance of Naïve Bayes and BayesLogit methods. The dataset contains 64 e-mails from the DBWorld newsletter and the task is to classify the e-mails into either announcements of conferences or everything else. The reference for this dataset is a course by Prof. Michele Filannino (reference 5 in the References section of this chapter). The dataset can be downloaded from the UCI website at https://archive.ics.uci.edu/ml/datasets/DBWorld+e-mails#.
Some preprocessing of the dataset would be required to use it for both the methods. The dataset is in the ARFF format. You need to download the foreign R package (http://cran.r-project.org/web/packages/foreign/index.html) and use the
read.arff( )
method in it to read the file into an R data frame.