Technical requirements
In this chapter, we'll process a dataset. The dataset and the chapter code can be found at https://github.com/PacktPublishing/Mastering-spaCy/tree/main/Chapter06.
We used the pandas library of Python to manipulate our dataset, besides using spaCy. We also used the awk command-line tool. pandas can be installed via pip and awk is preinstalled in many Linux distributions.