Chapter 2: Core Operations with spaCy
In this chapter, you will learn the core operations with spaCy, such as creating a language pipeline, tokenizing the text, and breaking the text into its sentences.
First, you'll learn what a language processing pipeline is and the pipeline components. We'll continue with general spaCy conventions – important classes and class organization – to help you to better understand spaCy library organization and develop a solid understanding of the library itself.
You will then learn about the first pipeline component – Tokenizer. You'll also learn about an important linguistic concept – lemmatization – along with its applications in natural language understanding (NLU). Following that, we will cover container classes and spaCy data structures in detail. We will finish the chapter with useful spaCy features that you'll use in everyday NLP development.
We're going to cover the following...