Chapter 8. There Is No Spoon – The Realities of Real-time Distributed Data Collection
In this last chapter, I thought we'd cover some of the less concrete, more random thoughts I have around data collection into Hadoop. There's no hard science behind some of this and you should feel perfectly at ease to disagree with me.
While Hadoop is a great tool for consuming vast quantities of data, I often think of a picture of the logjam that occurred in 1886 on the St. Croix River in Minnesota (http://www.nps.gov/sacn/historyculture/stories.htm). When dealing with too much data you want to make sure you don't jam your river. Be sure you take the previous chapter on monitoring seriously and not just as a nice to have.