Using the Naive Bayes algorithm
The Naive Bayes algorithm is quite fast one, useful for the initial analysis of discrete variables. The algorithm calculates frequencies, or probabilities, for each possible state of every input variable in each state of the predictable variable. The probabilities are used for predictions on new datasets with known input attributes. As mentioned, the algorithm supports discrete (or discretized, of course) attributes only. Each input attribute is used separately from other input attributes. Therefore, input attributes are considered to be independent. I will show an example in Python. Let's start with the necessary imports:
from sklearn.metrics import accuracy_score from sklearn.naive_bayes import GaussianNB
The next step is to create the training and the test set from the SQL Server data I read earlier. In addition, as with other algorithms from the sklearn
library, I need to prepare the feature matrix and the target vector:
Xtrain = TM.loc[TM.TrainTest == 1...