Preface
Apache Hadoop is an open source distributed computing technology that assists users in processing large volumes of data with relative ease, helping them to generate tremendous insights into their data. Cloudera, with their open source distribution of Hadoop, has made data analytics on Big Data possible and accessible to anyone interested.
This book fully prepares you to be a Hadoop administrator, with special emphasis on Cloudera. It provides step-by-step instructions on setting up and managing a robust Hadoop cluster running Cloudera's Distribution Including Apache Hadoop (CDH).
This book starts out by giving you a brief introduction to Apache Hadoop and Cloudera. You will then move on to learn about all the tools and techniques needed to set up and manage a production-standard Hadoop cluster using CDH and Cloudera Manager.
In this book, you will learn the Hadoop architecture by understanding the different features of HDFS and walking through the entire flow of a MapReduce process. With this understanding, you will start exploring the different applications packaged into CDH and will follow a step-by-step guide to set up HDFS High Availability (HA) and HDFS Federation.
You will learn to use Cloudera Manager, Cloudera's cluster management application. Using Cloudera Manager, you will walk through the steps to configure security using Kerberos, learn about events and alerts, and also configure backups.