Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Hadoop Beginner's Guide
Hadoop Beginner's Guide

Hadoop Beginner's Guide: Get your mountain of data under control with Hadoop. This guide requires no prior knowledge of the software or cloud services – just a willingness to learn the basics from this practical step-by-step tutorial.

eBook
NZ$14.99 NZ$64.99
Paperback
NZ$80.99
Subscription
Free Trial

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Table of content icon View table of contents Preview book icon Preview Book

Hadoop Beginner's Guide

Chapter 2. Getting Hadoop Up and Running

Now that we have explored the opportunities and challenges presented by large-scale data processing and why Hadoop is a compelling choice, it's time to get things set up and running.

In this chapter, we will do the following:

  • Learn how to install and run Hadoop on a local Ubuntu host

  • Run some example Hadoop programs and get familiar with the system

  • Set up the accounts required to use Amazon Web Services products such as EMR

  • Create an on-demand Hadoop cluster on Elastic MapReduce

  • Explore the key differences between a local and hosted Hadoop cluster

Hadoop on a local Ubuntu host


For our exploration of Hadoop outside the cloud, we shall give examples using one or more Ubuntu hosts. A single machine (be it a physical computer or a virtual machine) will be sufficient to run all the parts of Hadoop and explore MapReduce. However, production clusters will most likely involve many more machines, so having even a development Hadoop cluster deployed on multiple hosts will be good experience. However, for getting started, a single host will suffice.

Nothing we discuss will be unique to Ubuntu, and Hadoop should run on any Linux distribution. Obviously, you may have to alter how the environment is configured if you use a distribution other than Ubuntu, but the differences should be slight.

Other operating systems

Hadoop does run well on other platforms. Windows and Mac OS X are popular choices for developers. Windows is supported only as a development platform and Mac OS X is not formally supported at all.

If you choose to use such a platform, the...

Time for action – checking the prerequisites


Hadoop is written in Java, so you will need a recent Java Development Kit (JDK) installed on the Ubuntu host. Perform the following steps to check the prerequisites:

  1. First, check what's already available by opening up a terminal and typing the following:

    $ javac
    $ java -version
    
  2. If either of these commands gives a no such file or directory or similar error, or if the latter mentions "Open JDK", it's likely you need to download the full JDK. Grab this from the Oracle download page at http://www.oracle.com/technetwork/java/javase/downloads/index.html; you should get the latest release.

  3. Once Java is installed, add the JDK/bin directory to your path and set the JAVA_HOME environment variable with commands such as the following, modified for your specific Java version:

    $ export JAVA_HOME=/opt/jdk1.6.0_24
    $ export PATH=$JAVA_HOME/bin:${PATH}
    

What just happened?

These steps ensure the right version of Java is installed and available from the command line...

Time for action – downloading Hadoop


Carry out the following steps to download Hadoop:

  1. Go to the Hadoop download page at http://hadoop.apache.org/common/releases.html and retrieve the latest stable version of the 1.0.x branch; at the time of this writing, it was 1.0.4.

  2. You'll be asked to select a local mirror; after that you need to download the file with a name such as hadoop-1.0.4 -bin.tar.gz.

  3. Copy this file to the directory where you want Hadoop to be installed (for example, /usr/local), using the following command:

    $ cp Hadoop-1.0.4.bin.tar.gz /usr/local
    
  4. Decompress the file by using the following command:

    $ tar –xf hadoop-1.0.4-bin.tar.gz
    
  5. Add a convenient symlink to the Hadoop installation directory.

    $ ln -s /usr/local/hadoop-1.0.4 /opt/hadoop
    
  6. Now you need to add the Hadoop binary directory to your path and set the HADOOP_HOME environment variable, just as we did earlier with Java.

    $ export HADOOP_HOME=/usr/local/Hadoop
    $ export PATH=$HADOOP_HOME/bin:$PATH
    
  7. Go into the conf directory within...

Time for action – setting up SSH


Carry out the following steps to set up SSH:

  1. Create a new OpenSSL key pair with the following commands:

    $ ssh-keygen
    Generating public/private rsa key pair.
    Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): 
    Created directory '/home/hadoop/.ssh'.
    Enter passphrase (empty for no passphrase): 
    Enter same passphrase again: 
    Your identification has been saved in /home/hadoop/.ssh/id_rsa.
    Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
    
    
  2. Copy the new public key to the list of authorized keys by using the following command:

    $ cp .ssh/id _rsa.pub  .ssh/authorized_keys 
    
  3. Connect to the local host.

    $ ssh localhost
    The authenticity of host 'localhost (127.0.0.1)' can't be established.
    RSA key fingerprint is b6:0c:bd:57:32:b6:66:7c:33:7b:62:92:61:fd:ca:2a.
    Are you sure you want to continue connecting (yes/no)? yes
    Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
    
  4. Confirm that the password-less SSH is working.

    $ ssh localhost...

Time for action – using Hadoop to calculate Pi


We will now use a sample Hadoop program to calculate the value of Pi. Right now, this is primarily to validate the installation and to show how quickly you can get a MapReduce job to execute. Assuming the HADOOP_HOME/bin directory is in your path, type the following commands:

$ Hadoop jar hadoop/hadoop-examples-1.0.4.jar  pi 4 1000
Number of Maps  = 4
Samples per Map = 1000
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Starting Job
12/10/26 22:56:11 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
12/10/26 22:56:11 INFO mapred.FileInputFormat: Total input paths to process : 4
12/10/26 22:56:12 INFO mapred.JobClient: Running job: job_local_0001
12/10/26 22:56:12 INFO mapred.FileInputFormat: Total input paths to process : 4
12/10/26 22:56:12 INFO mapred.MapTask: numReduceTasks: 1

12/10/26 22:56:14 INFO mapred.JobClient:  map 100% reduce 100%
12/10/26 22:56:14...

Time for action – configuring the pseudo-distributed mode


Take a look in the conf directory within the Hadoop distribution. There are many configuration files, but the ones we need to modify are core-site.xml, hdfs-site.xml and mapred-site.xml.

  1. Modify core-site.xml to look like the following code:

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
    <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
    </property>
    </configuration>
  2. Modify hdfs-site.xml to look like the following code:

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
    <property>
    <name>dfs.replication</name>
    <value>1</value>
    </property>
    </configuration>
  3. Modify mapred...

Time for action – changing the base HDFS directory


Let's first set the base directory that specifies the location on the local filesystem under which Hadoop will keep all its data. Carry out the following steps:

  1. Create a directory into which Hadoop will store its data:

    $ mkdir /var/lib/hadoop
    
  2. Ensure the directory is writeable by any user:

    $ chmod 777 /var/lib/hadoop
    
  3. Modify core-site.xml once again to add the following property:

    <property>
    <name>hadoop.tmp.dir</name>
    <value>/var/lib/hadoop</value>
    </property>

What just happened?

As we will be storing data in Hadoop and all the various components are running on our local host, this data will need to be stored on our local filesystem somewhere. Regardless of the mode, Hadoop by default uses the hadoop.tmp.dir property as the base directory under which all files and data are written.

MapReduce, for example, uses a /mapred directory under this base directory; HDFS uses /dfs. The danger is that the default value...

Time for action – formatting the NameNode


Before starting Hadoop in either pseudo-distributed or fully distributed mode for the first time, we need to format the HDFS filesystem that it will use. Type the following:

$  hadoop namenode -format

The output of this should look like the following:

$ hadoop namenode -format
12/10/26 22:45:25 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = vm193/10.0.0.193
STARTUP_MSG:   args = [-format]

12/10/26 22:45:25 INFO namenode.FSNamesystem: fsOwner=hadoop,hadoop
12/10/26 22:45:25 INFO namenode.FSNamesystem: supergroup=supergroup
12/10/26 22:45:25 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/10/26 22:45:25 INFO common.Storage: Image file of size 96 saved in 0 seconds.
12/10/26 22:45:25 INFO common.Storage: Storage directory /var/lib/hadoop-hadoop/dfs/name has been successfully formatted.
12/10/26 22:45:26 INFO namenode.NameNode: SHUTDOWN_MSG...

Time for action – starting Hadoop


Unlike the local mode of Hadoop, where all the components run only for the lifetime of the submitted job, with the pseudo-distributed or fully distributed mode of Hadoop, the cluster components exist as long-running processes. Before we use HDFS or MapReduce, we need to start up the needed components. Type the following commands; the output should look as shown next, where the commands are included on the lines prefixed by $:

  1. Type in the first command:

    $ start-dfs.sh
    starting namenode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-namenode-vm193.out
    localhost: starting datanode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-vm193.out
    localhost: starting secondarynamenode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-vm193.out
    
  2. Type in the second command:

    $ jps
    9550 DataNode
    9687 Jps
    9638 SecondaryNameNode
    9471 NameNode
    
  3. Type in the third command:

    $ hadoop dfs -ls /
    Found 2 items
    drwxr-xr-x   - hadoop...

Time for action – using HDFS


As the preceding example shows, there is a familiar-looking interface to HDFS that allows us to use commands similar to those in Unix to manipulate files and directories on the filesystem. Let's try it out by typing the following commands:

Type in the following commands:

$ hadoop -mkdir /user
$ hadoop -mkdir /user/hadoop
$ hadoop fs -ls /user
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2012-10-26 23:09 /user/Hadoop
$ echo "This is a test." >> test.txt
$ cat test.txt
This is a test.
$ hadoop dfs -copyFromLocal test.txt  .
$ hadoop dfs -ls
Found 1 items
-rw-r--r--   1 hadoop supergroup         16 2012-10-26 23:19/user/hadoop/test.txt
$ hadoop dfs -cat test.txt
This is a test.
$ rm test.txt 
$ hadoop dfs -cat test.txt
This is a test.
$ hadoop fs -copyToLocal test.txt
$ cat test.txt
This is a test.

What just happened?

This example shows the use of the fs subcommand to the Hadoop utility. Note that both dfs and fs commands are equivalent). Like...

Time for action – WordCount, the Hello World of MapReduce


Many applications, over time, acquire a canonical example that no beginner's guide should be without. For Hadoop, this is WordCount – an example bundled with Hadoop that counts the frequency of words in an input text file.

  1. First execute the following commands:

    $ hadoop dfs -mkdir data
    $ hadoop dfs -cp test.txt data
    $ hadoop dfs -ls data
    Found 1 items
    -rw-r--r--   1 hadoop supergroup         16 2012-10-26 23:20 /user/hadoop/data/test.txt
    
  2. Now execute these commands:

    $ Hadoop Hadoop/hadoop-examples-1.0.4.jar  wordcount data out
    12/10/26 23:22:49 INFO input.FileInputFormat: Total input paths to process : 1
    12/10/26 23:22:50 INFO mapred.JobClient: Running job: job_201210262315_0002
    12/10/26 23:22:51 INFO mapred.JobClient:  map 0% reduce 0%
    12/10/26 23:23:03 INFO mapred.JobClient:  map 100% reduce 0%
    12/10/26 23:23:15 INFO mapred.JobClient:  map 100% reduce 100%
    12/10/26 23:23:17 INFO mapred.JobClient: Job complete: job_201210262315_0002...

Using Elastic MapReduce


We will now turn to Hadoop in the cloud, the Elastic MapReduce service offered by Amazon Web Services. There are multiple ways to access EMR, but for now we will focus on the provided web console to contrast a full point-and-click approach to Hadoop with the previous command-line-driven examples.

Setting up an account in Amazon Web Services

Before using Elastic MapReduce, we need to set up an Amazon Web Services account and register it with the necessary services.

Creating an AWS account

Amazon has integrated their general accounts with AWS, meaning that if you already have an account for any of the Amazon retail websites, this is the only account you will need to use AWS services.

Note that AWS services have a cost; you will need an active credit card associated with the account to which charges can be made.

If you require a new Amazon account, go to http://aws.amazon.com, select create a new AWS account, and follow the prompts. Amazon has added a free tier for some services...

Time for action – WordCount on EMR using the management console


Let's jump straight into an example on EMR using some provided example code. Carry out the following steps:

  1. Browse to http://aws.amazon.com, go to Developers | AWS Management Console, and then click on the Sign in to the AWS Console button. The default view should look like the following screenshot. If it does not, click on Amazon S3 from within the console.

  2. As shown in the preceding screenshot, click on the Create bucket button and enter a name for the new bucket. Bucket names must be globally unique across all AWS users, so do not expect obvious bucket names such as mybucket or s3test to be available.

  3. Click on the Region drop-down menu and select the geographic area nearest to you.

  4. Click on the Elastic MapReduce link and click on the Create a new Job Flow button. You should see a screen like the following screenshot:

  5. You should now see a screen like the preceding screenshot. Select the Run a sample application radio button and...

Comparison of local versus EMR Hadoop


After our first experience of both a local Hadoop cluster and its equivalent in EMR, this is a good point at which we can consider the differences of the two approaches.

As may be apparent, the key differences are not really about capability; if all we want is an environment to run MapReduce jobs, either approach is completely suited. Instead, the distinguishing characteristics revolve around a topic we touched on in Chapter 1, What It's All About, that being whether you prefer a cost model that involves upfront infrastructure costs and ongoing maintenance effort over one with a pay-as-you-go model with a lower maintenance burden along with rapid and conceptually infinite scalability. Other than the cost decisions, there are a few things to keep in mind:

  • EMR supports specific versions of Hadoop and has a policy of upgrading over time. If you have a need for a specific version, in particular if you need the latest and greatest versions immediately after...

Summary


We covered a lot of ground in this chapter, in regards to getting a Hadoop cluster up and running and executing MapReduce programs on it.

Specifically, we covered the prerequisites for running Hadoop on local Ubuntu hosts. We also saw how to install and configure a local Hadoop cluster in either standalone or pseudo-distributed modes. Then, we looked at how to access the HDFS filesystem and submit MapReduce jobs. We then moved on and learned what accounts are needed to access Elastic MapReduce and other AWS services.

We saw how to browse and create S3 buckets and objects using the AWS management console, and also how to create a job flow and use it to execute a MapReduce job on an EMR-hosted Hadoop cluster. We also discussed other ways of accessing AWS services and studied the differences between local and EMR-hosted Hadoop.

Now that we have learned about running Hadoop locally or on EMR, we are ready to start writing our own MapReduce programs, which is the topic of the next chapter...

Left arrow icon Right arrow icon

Key benefits

  • Learn tools and techniques that let you approach big data with relish and not fear
  • Shows how to build a complete infrastructure to handle your needs as your data grows
  • Hands-on examples in each chapter give the big picture while also giving direct experience

Description

Data is arriving faster than you can process it and the overall volumes keep growing at a rate that keeps you awake at night. Hadoop can help you tame the data beast. Effective use of Hadoop however requires a mixture of programming, design, and system administration skills."Hadoop Beginner's Guide" removes the mystery from Hadoop, presenting Hadoop and related technologies with a focus on building working systems and getting the job done, using cloud services to do so when it makes sense. From basic concepts and initial setup through developing applications and keeping the system running as the data grows, the book gives the understanding needed to effectively use Hadoop to solve real world problems.Starting with the basics of installing and configuring Hadoop, the book explains how to develop applications, maintain the system, and how to use additional products to integrate with other systems.While learning different ways to develop applications to run on Hadoop the book also covers tools such as Hive, Sqoop, and Flume that show how Hadoop can be integrated with relational databases and log collection.In addition to examples on Hadoop clusters on Ubuntu uses of cloud services such as Amazon, EC2 and Elastic MapReduce are covered.

Who is this book for?

This book assumes no existing experience with Hadoop or cloud services. It assumes you have familiarity with a programming language such as Java or Ruby but gives you the needed background on the other topics.

What you will learn

  • The trends that led to Hadoop and cloud services, giving the background to know when to use the technology
  • Best practices for setup and configuration of Hadoop clusters, tailoring the system to the problem at hand
  • Developing applications to run on Hadoop with examples in Java and Ruby
  • How Amazon Web Services can be used to deliver a hosted Hadoop solution and how this differs from directly-managed environments
  • Integration with relational databases, using Hive for SQL queries and Sqoop for data transfer
  • How Flume can collect data from multiple sources and deliver it to Hadoop for processing
  • What other projects and tools make up the broader Hadoop ecosystem and where to go next
Estimated delivery fee Deliver to New Zealand

Standard delivery 10 - 13 business days

NZ$20.95

Premium delivery 5 - 8 business days

NZ$74.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Feb 22, 2013
Length: 398 pages
Edition : 1st
Language : English
ISBN-13 : 9781849517300
Vendor :
Apache
Category :
Tools :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Estimated delivery fee Deliver to New Zealand

Standard delivery 10 - 13 business days

NZ$20.95

Premium delivery 5 - 8 business days

NZ$74.95
(Includes tracking information)

Product Details

Publication date : Feb 22, 2013
Length: 398 pages
Edition : 1st
Language : English
ISBN-13 : 9781849517300
Vendor :
Apache
Category :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just NZ$7 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just NZ$7 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total NZ$ 161.98
Hadoop Beginner's Guide
NZ$80.99
Practical Data Analysis
NZ$80.99
Total NZ$ 161.98 Stars icon
Banner background image

Table of Contents

11 Chapters
What It's All About Chevron down icon Chevron up icon
Getting Hadoop Up and Running Chevron down icon Chevron up icon
Understanding MapReduce Chevron down icon Chevron up icon
Developing MapReduce Programs Chevron down icon Chevron up icon
Advanced MapReduce Techniques Chevron down icon Chevron up icon
When Things Break Chevron down icon Chevron up icon
Keeping Things Running Chevron down icon Chevron up icon
A Relational View on Data with Hive Chevron down icon Chevron up icon
Working with Relational Databases Chevron down icon Chevron up icon
Data Collection with Flume Chevron down icon Chevron up icon
Where to Go Next Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.7
(13 Ratings)
5 star 15.4%
4 star 46.2%
3 star 30.8%
2 star 7.7%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




MarcTC Feb 05, 2014
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I am very interested on Big Data and have read a lot of books about the subject. It was time for me to get hands. So I was looking for a beginner book for Hadoop. I already had the knowledge on Big Data as a subject and I also knew the type of problems we are solving. So I looked for a beginner book for Hadoop as I didn't have any prior experience with the platform. I am using Mac OSX machine. I know my way around linux/Unix shell.I started working through the book. The first chapter is a great short and sweet introduction to Hadoop, MapReduce and HDFS. Its not too lengthy. Just the right length for a person like me who gets bored pretty quickly with a lot of information. From Chapter 2, the real hands on work started. Although the instructions are mainly for Ubuntu linux, it worked like a charm in Mac. I didnt have any issues that other reviewers mentioned. when you run the first example, i couldn't use the exact command given in the book to run the first example of Pi calculation simply because i was using the latest version of Hadoop so the Jar name was different. Thats not something that the author can fix. once you give the correct jar name, it worked like a charm. Im now moving alone to the other chapters very quickly. Its a really good book if you just wanna familiarize yourself with the product, which is exactly the book is about. its well written. easy to follow. Author doesnt try to bore you to death.I recommend reading the below book if you first wanna understand what the Big Data is all about about and see what type of issues its trying solvehttp://www.amazon.com/gp/product/B009N08NKW/ref=cm_cr_ryp_prd_ttl_sol_39
Amazon Verified review Amazon
Cheng Sep 24, 2014
Full star icon Full star icon Full star icon Full star icon Full star icon 5
A very good book for the beginner
Amazon Verified review Amazon
Amazon Customer Nov 08, 2014
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
I like this book, it is very clear and has good examples to try out. I don't give it 5 stars because many of the examples have mistakes, many of them very obvious.
Amazon Verified review Amazon
Varun Muriyanat Sep 29, 2013
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
Very good for beginners. Gives both the big picture, as well as the start up implementation for newbies. Execellent choice.
Amazon Verified review Amazon
Alexander Tarnowski Jun 03, 2013
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
I read this book "out of context", meaning that I didn't have an interesting problem solvable by MapReduce at hand and a dire need to learn Hadoop at the time of reading. Instead, I took time to read this book with the purpose of determining whether it's a good beginner book or not. All in all, I'd say that it is. The author really succeeds in creating a context for Hadoop and its ecosystem.From the second chapter and onwards, Hadoop is gradually introduced using very detailed instructions. The general format for doing this is by listing every single command the user needs to type and its output, so the book is full of terminal session listings. All such listings are followed by sections called "What just happened?" that explain in detail the purpose of the commands and their output. This is actually quite helpful for readers who understand what's happening from just looking at the session listing; such readers can safely skip these sections.The above approach should enable any reader, regardless of level of experience, to follow along and do the exercises or labs, which is a good thing for a beginner book. I have a remark about this though: the session dumps could have been proofread better! I can't say that I read them through a magnifying glass, but still I found quite a few errors.As for the contents, the book never shows the monster! In my opinion, the introductory chapter fails to actually establish a case for Hadoop and MapReduce. Yes, it's about big data, scaling and problems and so on, but I couldn't find a logical transition to Hadoop as a solution to these problems. Instead, chapter two illustrates the framework with a distributed calculation of pi and the word counting program (Hadoop's version of the "Hello world" program).In a later chapter, Hadoop is used to process a dataset with UFO sightings, and then, in a chapter on advanced techniques a graph problem is solved. Not until that chapter did I start getting a feeling for what kind of problems Hadoop and MapReduce should be used for. This is what I mean by "never showing the monster". Being an introductory text, I'd prefer the first or second chapter to describe some problems that are good candidates for the MapReduce paradigm, illustrate one of them, and then show how a distributed computation would help.That said, I may be off track here. This is a book on Hadoop, and not MapReduce in general, and I did say that I read it without having intricate MapReduce problems at hand. This is pretty much my only criticism. If a reader doesn't perceive this as a problem, then there's nothing to complain about. After reading the book, I feel that I have a very good feeling for what Hadoop does and what building blocks in its ecosystem to use.One more thing... Here and there the book contains examples of how to use Amazon's EMR. This didn't feel awfully important to me, but it provides an even more solid explanation of how to apply the framework to bigger problems and how it can be used in a cloud environment.To sum up: a good and comprehensive book on Hadoop that covers the framework and its ecosystem, verbose and easy to follow examples, and a structure that leaves the reader with a sense of getting the big picture.Minuses: could devote some more pages to the MapReduce paradigm and have its sample listings proofread better.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela