The need to become literate with both structured and unstructured data continues to evolve. Working with structured data has well-established techniques such as merging and uniform data types, which we have reviewed in prior chapters. However, working with unstructured data is a relatively new concept and is rapidly turning into a must-have skill in data analysis. Natural Language Processing (NLP) has evolved into an essential skill, so this chapter introduces the concepts and tools available to analyze narrative free text. As technology has advanced, using these techniques can help you to provide transparency to unstructured data, which would have been difficult to uncover only a few years ago.
We will cover the following topics in this chapter:
- Preparing to work with unstructured data
- Tokenization explained
- Counting words and exploring results...