Naive Bayes SMS spam classification example
Naive Bayes classifier has been developed using the SMS spam collection data available at http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/. In this chapter, various techniques available in NLP techniques have been discussed to preprocess prior to build the Naive Bayes model:
>>> import csv
>>> smsdata = open('SMSSpamCollection.txt','r')
>>> csv_reader = csv.reader(smsdata,delimiter='\t')
The following sys
package lines code can be used in case of any utf-8
errors encountered while using older versions of Python, or else does not necessary with the latest version of Python 3.6:
>>> import sys
>>> reload (sys)
>>> sys.setdefaultendocing('utf-8')
Normal coding starts from here as usual:
>>> smsdata_data = []
>>> smsdata_labels = []
>>> for line in csv_reader:
... smsdata_labels.append(line[0])
... smsdata_data.append(line[1])
>>> smsdata...