Summary
In this chapter, we learned that caching downloaded web pages will save time and minimize bandwidth when recrawling a website. The main drawback of this is that the cache takes up disk space, which can be minimized through compression. Additionally, building on top of an existing database system, such as MongoDB, can be used to avoid any filesystem limitations.
In the next chapter, we will add further functionalities to our crawler so that we can download multiple web pages concurrently and crawl faster.