Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Hadoop 2.x Administration Cookbook

You're reading from   Hadoop 2.x Administration Cookbook Administer and maintain large Apache Hadoop clusters

Arrow left icon
Product type Paperback
Published in May 2017
Publisher Packt
ISBN-13 9781787126732
Length 348 pages
Edition 1st Edition
Tools
Arrow right icon
Author (1):
Arrow left icon
Aman Singh Aman Singh
Author Profile Icon Aman Singh
Aman Singh
Arrow right icon
View More author details
Toc

Table of Contents (14) Chapters Close

Preface 1. Hadoop Architecture and Deployment FREE CHAPTER 2. Maintaining Hadoop Cluster HDFS 3. Maintaining Hadoop Cluster – YARN and MapReduce 4. High Availability 5. Schedulers 6. Backup and Recovery 7. Data Ingestion and Workflow 8. Performance Tuning 9. HBase Administration 10. Cluster Planning 11. Troubleshooting, Diagnostics, and Best Practices 12. Security Index

What this book covers

Chapter 1, Hadoop Architecture and Deployment, covers Hadoop's architecture, its components, various installation modes and important daemons, and the services that make Hadoop a robust system. This chapter covers single-node and multinode clusters.

Chapter 2, Maintaining Hadoop Cluster – HDFS, wraps the storage layer HDFS, block size, replication, cluster health, Quota configuration, rack awareness, and communication channel between nodes.

Chapter 3, Maintaining Hadoop Cluster – YARN and MapReduce, talks about the processing layer in Hadoop and the resource management framework YARN. This chapter covers how to configure YARN components, submit jobs, configure job history server, and YARN fundamentals.

Chapter 4, High Availability, covers high availability for a Namenode and Resourcemanager, ZooKeeper configuration, HDFS storage-based policies, HDFS snapshots, and rolling upgrades.

Chapter 5, Schedulers, talks about YARN schedulers such as fair and capacity scheduler, with detailed recipes on configuring Queues, Queue ACLs, configuration of users and groups, and other Queue administration commands.

Chapter 6, Backup and Recovery, covers Hadoop metastore, backup and restore procedures on a Namenode, configuration of a secondary Namenode, and various ways of recovering lost Namenodes. This chapter also talks about configuring HDFS and YARN logs for troubleshooting.

Chapter 7, Data Ingestion and Workflow, talks about Hive configuration and its various modes of operation. This chapter also covers setting up Hive with the credential store and highly available access using ZooKeeper. The recipes in this chapter give details about the process of loading data into Hive, partitioning, bucketing concepts, and configuration with an external metastore. It also covers Oozie installation and Flume configuration for log ingestion.

Chapter 8, Performance Tuning, covers the performance tuning aspects of HDFS, YARN containers, the operating system, and network parameters, as well as optimizing the cluster for production by comparing benchmarks for various configurations.

Chapter 9, Hbase and RDBMS, talks about HBase cluster configuration, best practices, HBase tuning, backup, and restore. It also covers migration of data from MySQL to HBase and the procedure to upgrade HBase to the latest release.

Chapter 10, Cluster Planning, covers Hadoop cluster planning and the best practices for designing clusters are, in terms of disk storage, network, servers, and placement policy. This chapter also covers costing and the impact of SLA driver workloads on cluster planning.

Chapter 11, Troubleshooting, Diagnostics, and Best Practices, talks about the troubleshooting steps for a Namenode and Datanode, and diagnoses communication errors. It also covers details on logs and how to parse them for errors to extract important key points on issues faced.

Chapter 12, Security, covers Hadoop security in terms of data encryption, in-transit encryption, ssl configuration, and, more importantly, configuring Kerberos for the Hadoop cluster. This chapter also covers auditing and a recipe on securing ZooKeeper.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image