Mining the 20 Newsgroups Dataset with Text Analysis Techniques
In previous chapters, we went through a bunch of fundamental machine learning concepts and supervised learning algorithms. Starting from this chapter, as the second step of our learning journey, we will be covering in detail several important unsupervised learning algorithms and techniques. To make our journey more interesting, we will start with a natural language processing (NLP) problem— exploring newsgroups data. You will gain hands-on experience in working with text data, especially how to convert words and phrases into machine-readable values and how to clean up words with little meaning. We will also visualize text data by mapping it into a two-dimensional space in an unsupervised learning manner.
We will go into detail on each of the following topics:
- NLP fundamentals and applications
- Touring Python NLP libraries
- Tokenization, stemming, and lemmatization
- Getting...