This chapter continues our journey of classifying text data, a great starting point of learning machine learning classification with broad real-life applications. We will be focusing on topic classification on the news data we used in Chapter 2, Exploring the 20 Newsgroups Dataset with Text Analysis Algorithms and using another powerful classifier, support vector machine, to solve such problems.
We will get into details for the topics mentioned:
- Term frequency-inverse document frequency
- Support vector machine
- The mechanics of SVM
- The implementations of SVM
- Multiclass classification strategies
- The nonlinear kernels of SVM
- Choosing between linear and Gaussian kernels
- Overfitting and reducing overfitting in SVM
- News topic classification with SVM
- Tuning with grid search and cross-validation