What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!

Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!

50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

Thousands of reference materials covering every tech concept you need to stay up to date.

Subscribe now

View plans & pricing

Mastering Hadoop

Chapter 2. Advanced MapReduce

MapReduce is a programming model for parallel and distributed processing of data. It consists of two steps: Map and Reduce. These steps are inspired from functional programming, a branch of computer science that deals with mathematical functions as computational units. Properties of functions such as immutability and statelessness are attractive for parallel and distributed processing. They provide a high degree of parallelism and fault tolerance at lower costs and semantic complexity.

In this chapter, we will look at advanced optimizations when running MapReduce jobs on Hadoop clusters. Every MapReduce job has input data and a Map task per split of this data. The Map task calls a map function repeatedly on every record, represented as a key-value pair. The map is a function that transforms data from one domain to another. The intermediate output records of each Map task are shuffled and sorted before transferring it downstream to the Reduce tasks...

Description

Do you want to broaden your Hadoop skill set and take your knowledge to the next level? Do you wish to enhance your knowledge of Hadoop to solve challenging data processing problems? Are your Hadoop jobs, Pig scripts, or Hive queries not working as fast as you intend? Are you looking to understand the benefits of upgrading Hadoop? If the answer is yes to any of these, this book is for you. It assumes novice-level familiarity with Hadoop.

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!

Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!

50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

Thousands of reference materials covering every tech concept you need to stay up to date.

Subscribe now

View plans & pricing

Frequently bought together

Mastering Hadoop

Dec 2014 374 pages

4 (3)

eBook

€8.99 ~~€32.99~~

Learning Hadoop 2

Feb 2015 382 pages

3.8 (4)

eBook

€8.99 ~~€32.99~~

Total € 83.98

€41.99

Total € 83.98

Gurmukh Feb 25, 2015

Very well written with simplistic flow. It is great book for beginners as well as intermediate users, who want to learn Hadoop is a logical manner, with right understanding rather then cramming things. The example and the code snippets are a head start to get things started.

Amazon Verified review

Sumit Pal Feb 17, 2015

This is a pretty well written book both in terms of content, the way the author has put forth the concepts and general organization of the book.The content is pretty exhaustive - however this is not a starter book, it is more at the intermediate / expert level. The content of the book shows that the author knows the stuff and has experience working with Hadoop and the intricacies of it.I would recommend it to intermediate level Hadoop Developers to have a look at the book

vj Mar 10, 2015

this book is definitely recommended for both beginner and intermediate users. It got example showing the workings of various Hadoop ecosystem YARN, PIG, Hive Storm to name some. There are lots of good examples in the book with code. Of-course some readers might find it unnecessary to have the code printed in the book taking up space, but for me its a plus.

Mastering Hadoop: Go beyond the basics and master the next generation of Hadoop data processing platforms

What do you get with a Packt Subscription?

Mastering Hadoop

Chapter 2. Advanced MapReduce

MapReduce input

The InputFormat class

The RecordReader class

Hadoop's "small files" problem

Filtering inputs

The Map task

Tip

The dfs.blocksize attribute

MapReduce input

The InputFormat class

MapReduce output

Speculative execution of tasks

The Map task

Tip

The dfs.blocksize attribute

The Reduce task

Tip

Note

MapReduce output

Speculative execution of tasks

Page 1 of 11

Description

Product Details

What do you get with a Packt Subscription?

Product Details

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

People who bought this also bought

About the author

FAQs

Mastering Hadoop: Go beyond the basics and master the next generation of Hadoop data processing platforms

What do you get with a Packt Subscription?

The InputFormat class

Description

Product Details

What do you get with a Packt Subscription?

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

People who bought this also bought

About the author

FAQs