While we introduced text analysis in Chapter 1, What is Text Analysis?, we did not discuss any of the technical details behind building a text analysis pipeline. In this chapter, we will introduce you to spaCy's language model – these will serve as the first step in text analysis and are the first building block in our pipelines. In this chapter, we will introduce the reader to spaCy and how we can use spaCy to help us in our text analysis tasks, as well as talk about some of its more powerful functionalities, such as Part of Speech-tagging and Named Entity Recognition-tagging. We will finish up with an example of how we can preprocess data quickly and efficiently using the natural language processing Python library, spaCy.
We will cover the following topics in this chapter:
- spaCy
- Installation
- Tokenizing Text
- Summary
- References