Chapter 8. Text Mining and Social Network Analysis
In this chapter, we will cover the following recipes:
- Creating a categorized corpus
- Tokenizing news articles in sentences and words
- Stemming, lemmatizing, filtering, and TF-IDF scores
- Recognizing named entities
- Extracting topics with non-negative matrix factorization
- Implementing a basic terms database
- Computing social network density
- Calculating social network closeness centrality
- Determining the betweenness centrality
- Estimating the average clustering coefficient
- Calculating the assortativity coefficient of a graph
- Getting the clique number of a graph
- Creating a document graph with cosine similarity
Introduction
Humans have communicated through language for thousands of years. Handwritten texts have been around for ages, the Gutenberg press was of course a huge development, but now that we have computers, the Internet, and social media, things have definitely spiraled out of control.
This chapter will help you cope with the flood of textual...