In the following sections, we will be putting the generic pipeline package that we built to the test by using it to construct the crawler pipeline for the Links 'R' Us project!
Following the single-responsibility principle, we will break down the crawl task into a sequence of smaller subtasks and assemble the pipeline illustrated in the following figure. The decomposition into smaller subtasks also comes with the benefit that each stage processor can be tested in total isolation without the need to create a pipeline instance:
Figure 2: The stages of the crawler pipeline that we will be constructing
The full code for the crawler and its tests can be found in the Chapter07/crawler package, which you can find at the book's GitHub repository.