Classifying data with the Naive Bayesian classifier
Bayesian classification is a way of updating your estimate of the probability that an item is in a given category, depending on what you already know about that item, category, and the world at large. In the case of a Naive Bayesian system, we assume that all features of the items are independent. For example, elevation and average snowfall are not independent (higher elevations tend to have more snow), but elevation and median income should be independent. This algorithm has been useful in a number of interesting areas, for example, spam detection in emails, automatic language detection, and document classification. In this recipe, we'll apply it to the mushroom dataset that we looked at in the Classifying data with decision trees recipe.
Getting ready
First, we'll need to use the dependencies that we specified in the project.clj
file in the Loading CSV and ARFF files into Weka recipe. We'll also use the defanalysis
macro...