Chapter 3. K-Nearest Neighbors and Naive Bayes
In the previous chapter, we have learned about computationally intensive methods. In contrast, this chapter discusses the simple methods to balance it out! We will be covering the two techniques, called k-nearest neighbors (KNN)and Naive Bayes here. Before touching on KNN, we explained the issue with the curse of dimensionality with a simulated example. Subsequently, breast cancer medical examples have been utilized to predict whether the cancer is malignant or benign using KNN. In the final section of the chapter, Naive Bayes has been explained with spam/ham classification, which also involves the application of the natural language processing (NLP) techniques consisting of the following basic preprocessing and modeling steps:
- Punctuation removal
- Word tokenization and lowercase conversion
- Stopwords removal
- Stemming
- Lemmatization with POS tagging
- Conversion of words into TF-IDF to create numerical representation of words
- Application of the Naive Bayes...