Analysing Text Data in Python
Very often, we will need to perform exploratory data analysis on text data. The amount of digital data created in the world today has increased significantly, and text data forms a substantial proportion of this digital data. Some common examples we see every day include emails, social media posts, and text messages. Text data is classified as unstructured data because it usually doesn’t appear in rows and columns.
In previous chapters, we focused on exploratory data analysis techniques for structured data (i.e., data that appears in rows and columns). However, in the chapter, we will focus on exploratory data analysis techniques for a very common type of unstructured data – text data.
In this chapter, we will discuss common techniques to prepare and analyze text data. The chapter includes the following:
- Preparing text data
- Removing stop words
- Analyzing part of speech (POS)
- Performing stemming and lemmatization ...