In this book, we will examine advanced concepts of the Hadoop ecosystem and build high-performance Hadoop data pipelines with security, monitoring, and data governance.
We will also promote enterprise-grade applications using Apache Spark and Flink. This book teaches the internal workings of Hadoop, which includes building solutions to some real-world use cases. We will master the best practices for enterprises using Hadoop 3 as a data platform, including authorization and authentication. We will also learn how to model data in Hadoop, gain an in-depth understanding of distributed computing using Hadoop 3, and explore the different batch data-processing patterns.
Lastly, we will understand how components in the Hadoop ecosystem can be integrated effectively to implement a fast and reliable big data pipeline.