In this last chapter, we have covered advanced topics for Apache Hadoop. We started with business use cases for Apache Hadoop in different industries, covering healthcare, oil and gas, finance and banking, government, telecommunications, retail, and insurance. We then looked at advanced Hadoop storage formats, which are used today by many of Apache Hadoop's ecosystem software; we covered Parquet, ORC, and Avro. We looked at the real-time streaming capabilities of Apache Storm, which can be used on a Hadoop cluster. Finally, we looked at Apache Spark when we tried to understand the different components of Apache Spark including streaming, SQL, and analytical capabilities. We also looked at its architecture.
We started this book with history of Apache Hadooop, its architecture, and open source v/s commercial hadoop implementations. We looked at new Hadoop 3.X features...