Packt+ | Advance your knowledge in tech

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Free Learning

Hadoop 2.x Administration Cookbook

You're reading from Hadoop 2.x Administration Cookbook Administer and maintain large Apache Hadoop clusters

Product type Paperback

Published in May 2017

Publisher Packt

ISBN-13 9781787126732

Length 348 pages

Edition 1st Edition

Tools

Hadoop

Concepts

System Administration

Author (1):

Aman Singh

View More author details

Table of Contents (14) Chapters

Preface

1. Hadoop Architecture and Deployment FREE CHAPTER

2. Maintaining Hadoop Cluster HDFS

3. Maintaining Hadoop Cluster – YARN and MapReduce

4. High Availability

5. Schedulers

6. Backup and Recovery

7. Data Ingestion and Workflow

8. Performance Tuning

9. HBase Administration

10. Cluster Planning

11. Troubleshooting, Diagnostics, and Best Practices

12. Security

Index

Configuring MapReduce for performance

In this recipe, we will touch upon MapReduce parameters and see how we can optimize them.

Getting ready

For this recipe, you will again need a running cluster with HDFS and YARN. Users must have completed the recipe Configuring YARN for performance recipe.

How to do it...

Connect to the master node master1.cyrus.com and switch to the hadoop user.
The file where these changes will be made is mapred-site.xml.
The first thing to adjust is to sort the buffer according to the HDFS block size. It must always be greater than the value of dfs.blocksize. This can be configured as follows:
```
<property>
<name>mapreduce.task.io.sort.mb</name>
<value>200</value>
</property>
```
The next value to tune is the number of streams to merge while sorting. This many file handles will be open per mapper:
```
<property>
<name>mapreduce.task.io.sort.factor</name>
<value>24</value>
</property>
```
Another important thing to take...

The rest of the chapter is locked

Register for a free Packt account to unlock a world of extra content!

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at AU $24.99/month. Cancel anytime

Authors (1)

Aman Singh

Aman Singh

Gurmukh Singh is a seasoned technology professional with 14+ years of industry experience in infrastructure design, distributed systems, performance optimization, and networks. He has worked in big data domain for the last 5 years and provides consultancy and training on various technologies. He has worked with companies such as HP, JP Morgan, and Yahoo. He has authored Monitoring Hadoop by Packt Publishing

See other products by Aman Singh

Other recommended products

Related to this chapter

Apache Hadoop 3 Quick Start Guide

Apache Hadoop 3 Quick Start Guide

Apache Hadoop is a widely used distributed data platform. It enables large datasets to be efficiently processed instead of using one large computer to store and process the data. This book will get you started with the Hadoop ecosystem, and introduce you to the main technical topics such as MapReduce, YARN and HDFS.

Oct 2018 7h 20m

HBase High Performance Cookbook

HBase High Performance Cookbook

Jan 2017 11h 40m

Mastering Hadoop 3

Mastering Hadoop 3

This is a comprehensive guide to understand advanced concepts of Hadoop ecosystem. You will learn how Hadoop works internally, and build solutions to some of real world use cases. Finally, you will have a solid understanding of how components in the Hadoop ecosystem are effectively integrated to implement a fast and reliable Big Data pipeline

Feb 2019 18h 8m

Apache Hive Essentials

Apache Hive Essentials

Apache Hive helps you deal with data summarization, queries, and analysis for huge amounts of data. This book will give you a background in big data, and familiarize you with your Hive working environment. Next you will cover advanced topics like performance and security in Hive and how to work efficiently to find solutions to big data problems.

Modern Big Data Processing with Hadoop

Modern Big Data Processing with Hadoop

This book presents unique techniques to conquer different Big Data processing and analytics challenges using Hadoop. Practical examples are provided to boost your understanding of Big Data concepts and their implementation. By the end of the book, you will have all the knowledge and skills you need to become a true Big Data expert.

Mar 2018 13h 8m

Mastering Apache Storm

Mastering Apache Storm

With real-world examples and clear explanations, this book will ensure you will have a thorough mastery Apache Storm.You'll get an understanding of deploying Storm on clusters. Introduce yourself to topics such as trident topology, monitoring, Storm Parallelism, scheduler and log processing. Learn how to integrate Storm with other well-known Big Data technologies such as HBase, Redis, Kafka, and Hadoop to realize the full potential of Storm.You will be able to use the knowledge to develop efficient, distributed real-time applications to cater to your business needs.

Aug 2017 9h 28m

Data Lake for Enterprises

Data Lake for Enterprises

The term 'Data Lake' has recently emerged as a prominent term in the big data industry. Data scientists can make use of it in deriving meaningful insights which can be used by businesses to redefine or transform the way they operate. Lambda architecture is also emerging as one of the very eminent patterns in the big data landscape, as it helps to derive useful information from not only the historical data but also correlates real-time data to enable business for taking critical decisions. This book tries to bring these two important aspects into one, namely data lake and lambda architecture.

May 2017 19h 52m

Personalised recommendations for you

Based on your interests and search pattern

Mastering PowerShell Scripting

Mastering PowerShell Scripting

PowerShell scripts provides a convenient method for automating tasks, using them proficiently can be challenging. This all-inclusive guide begins at the basics and covers advanced concepts, equipping you with tips to become an expert in PowerShell Core 7.3 scripting.

May 2024 27h 32m

Network Automation with Nautobot

Network Automation with Nautobot

This book will help you understand why a network source of truth is needed for long-term network automation success, which will in turn save you hundreds of hours in deploying and integrating Nautobot into network automation.

May 2024 27h 12m

NGINX HTTP Server

NGINX HTTP Server

Explore the power of NGINX with this guide covering an array of essential practical topics, including securing your infrastructure with automatic TLS certificates, placing NGINX in front of your existing applications, and much more.

May 2024 8h 44m

Mastering Azure Virtual Desktop

Mastering Azure Virtual Desktop

This updated edition will help you plan an Azure Virtual Desktop Architecture, implement its infrastructure, and manage its access and security. With content aligned with the exam objectives, it'll help you ace the Microsoft AZ-140 exam.

Jul 2024 23h 56m

Learn Ansible will teach you how to write Ansible Playbooks for deploying simple apps. This updated edition covers the latest Ansible features, helping you confidently implement Ansible in your daily workflows.

May 2024 13h 48m

HashiCorp Terraform Associate (003) Exam Guide

HashiCorp Terraform Associate (003) Exam Guide

This book will help you explore HashiCorp Terraform and prepare for Associate (003) certification, from understanding core concepts to advanced modules. You'll gain hands-on expertise, troubleshoot with confidence, and more.

May 2024 11h 28m

Kubernetes – An Enterprise Guide

Kubernetes – An Enterprise Guide

Navigate the complexities of Kubernetes and fully leverage its capabilities for enterprise applications. This edition dives into advanced deployments, groundbreaking techniques, and insights that will elevate your skills and redefine your expertise.

Aug 2024 22h 44m

Atlassian DevOps Toolchain Cookbook

Atlassian DevOps Toolchain Cookbook

Master setting up a DevOps toolchain using Atlassian tools and Open DevOps as a framework with this recipe-driven guide to automated testing, integration, deployment, observability, and incident management for streamlining development processes.

Jul 2024 16h 48m

AWS Certified Developer Associate Certification and Beyond

AWS Certified Developer Associate Certification and Beyond

This is your guide to passing the challenging AWS Certified Developer – Associate certification exam and setting yourself up for a rewarding career. Through a sample project, it explains how to design, architect, and implement applications on AWS.

Jul 2024 23h 40m

Implementing GitOps with Kubernetes

Implementing GitOps with Kubernetes

This book provides step-by-step tutorials and hands-on examples for effectively implementing GitOps practices in your Kubernetes deployments. You'll learn how to automate, monitor, and secure your infrastructure for efficient application delivery.

Aug 2024 14h 48m