So far, we have seen how Storm can be used for developing real-time stream processing applications. In general, these real-time applications are seldom used in isolation; they are more often than not used in combination with other batch processing operations.
The most common platform for developing batch jobs is Apache Hadoop. In this chapter, we will see how applications built with Apache Storm can be deployed over existing Hadoop clusters with the help of a Storm-YARN framework for optimized use and management of resources. We will also cover how we can write the process data into HDFS by creating an HDFS bolt in Storm.
In this chapter, we will cover the following topics:
- Overview of Apache Hadoop and its various components
- Setting up a Hadoop cluster
- Write Storm topology to persist data into HDFS
- Overview of Storm-YARN
- Deploying Storm-YARN on Hadoop...