One of the critical mistakes many big data architectures make is trying to handle multiple stages of the data pipeline with one tool. A fleet of servers handling the end-to-end data pipeline, from data storage and transformation to visualization, may be the most straightforward architecture, but it is also the most vulnerable to breakdowns in the pipeline. Such tightly-coupled big data architecture typically does not provide the best possible balance of throughput and cost for your needs.
It is recommended that big data architects decouple the pipeline. There are several advantages to decoupling storage and processing in multiple stages in particular, including increased fault tolerance. For example, if something goes wrong in the second round of processing and the hardware dedicated to that task fails, you won't have to start again from the beginning of the pipeline; your system can resume from the second storage stage. Decoupling your storage...