The following figure illustrates the high-level design of the pipeline that we will be building throughout the first half of this chapter:
Figure 1: A generic, multistage pipeline
Keep in mind that this is definitely not the only, or necessarily the best, way to go about implementing a data-processing pipeline. Pipelines are inherently application specific, so there is not really a one-size-fits-all guide for constructing efficient pipelines.
Having said that, the proposed design is applicable to a wide variety of use cases, including, but not limited to, the crawler component for the Links 'R' Us project. Let's examine the preceding figure in a bit more detail and identify the basic components that the pipeline comprises:
- The input source: Inputs essentially function as data-sources that pump data into the pipeline...