Building Pipelines for NLP Projects
In general, a pipeline refers to a structure that allows a streamlined flow of air, water, or something similar. In this context, pipeline has a similar meaning. It helps to streamline various stages of an NLP project.
An NLP project is done in various stages, such as tokenization, stemming, feature extraction (TFIDF matrix generation), and model building. Instead of carrying out each stage separately, we create an ordered list of all these stages. This list is known as a pipeline. The Pipeline
class of sklearn helps us combine these stages into one object that we can use to perform these stages one after the other in a sequence. We will solve a text classification problem using a pipeline in the next section to understand the working of a pipeline better.
Exercise 3.14: Building the Pipeline for an NLP Project
In this exercise, we will develop a pipeline that will allow us to create a TFIDF matrix representation from sklearn's fetch_20newsgroups...