The previous chapters focused on the different stages that are required to be executed in a machine learning (ML) project. Many moving parts have to be tied together for an ML model to execute and produce results successfully. This process of tying together different pieces of the ML process is known as pipelines. A pipeline is a generalized concept but very important concept for a Data Scientist. In software engineering, people build pipelines to develop software that is exercised from source code to deployment. Similarly, in ML, a pipeline is created to allow data flow from its raw format to some useful information. It provides mechanism to construct a multi-ML parallel pipeline system in order to compare the results of several ML methods.
Each stage of a pipeline is fed processed data from its preceding stage; that is, the output of a processing unit...