Mining the 20 Newsgroups Dataset with Text Analysis Techniques
In previous chapters, we went through a bunch of fundamental machine learning concepts and supervised learning algorithms. Starting from this chapter, as the second step of our learning journey, we will be covering in detail several important unsupervised learning algorithms and techniques related to text analysis. To make our journey more interesting, we will start with a Natural Language Processing (NLP) problem—exploring the 20 newsgroups data. You will gain hands-on experience and learn how to work with text data, especially how to convert words and phrases into machine-readable values and how to clean up words with little meaning. We will also visualize text data by mapping it into a two-dimensional space in an unsupervised learning manner.
We will go into detail on each of the following topics:
- How computers understand language – NLP
- Touring popular NLP libraries and picking up...