Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Mastering Hadoop 3
Mastering Hadoop 3

Mastering Hadoop 3: Big data processing at scale to unlock unique business insights

Arrow left icon
Profile Icon Wong Profile Icon Singh Profile Icon Kumar
Arrow right icon
₱579.99 ₱2490.99
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (1 Ratings)
eBook Feb 2019 544 pages 1st Edition
eBook
₱579.99 ₱2490.99
Paperback
₱3112.99
Subscription
Free Trial
Arrow left icon
Profile Icon Wong Profile Icon Singh Profile Icon Kumar
Arrow right icon
₱579.99 ₱2490.99
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (1 Ratings)
eBook Feb 2019 544 pages 1st Edition
eBook
₱579.99 ₱2490.99
Paperback
₱3112.99
Subscription
Free Trial
eBook
₱579.99 ₱2490.99
Paperback
₱3112.99
Subscription
Free Trial

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

Mastering Hadoop 3

Journey to Hadoop 3

Hadoop has come a long way since its inception. Powered by a community of open source enthusiasts, it has seen three major version releases. The version 1 release saw the light of day six years after the first release of Hadoop. With this release, the Hadoop platform had full capabilities that can run MapReduce-distributed computing on Hadoop Distributed File System (HDFS) distributed storage. It had some of the most major performance improvements ever done, along with full support for security. This release also enjoyed a lot of improvements with respect to HBASE.

The version 2 release made significant leaps compared to version 1 of Hadoop. It introduced YARN, a sophisticated general-purpose resource manager and job scheduling component. HDFS high availability, HDFS federations, and HDFS snapshots were some other prominent features introduced in version 2...

Hadoop origins and Timelines

Hadoop is changing the way people think about data. We need to know what led to the origin of this magical innovation. Who developed Hadoop and why? What problems existed before Hadoop? How has it solved these problems? What challenges were encountered during development? How has Hadoop transformed from version 1 to version 3? Let's walk through the origins of Hadoop and its journey to version 3.

Origins

In 1997, Doug Cutting, a co-founder of Hadoop, started working on project Lucene, which is a full-text search library. It was completely written in Java and is a full-text search engine. It analyzes text and builds an index on it. An index is just a mapping of text to locations, so it...

Overview of Hadoop 3 and its features

The first alpha release of Hadoop version 3.0.0 was on 30 August 2016. It was called version 3.0.0-alpha1. This was the first alpha release in a series of planned alphas and betas that ultimately led to 3.0.0 GA. The intention behind this alpha release was to quickly gather and act on feedback from downstream users.

With any such releases, there are some key drivers that lead to its birth. These key drivers create benefits that will ultimately help in the better functioning of Hadoop-augmented enterprise applications. Before we discuss the features of Hadoop 3, you should understand these driving factors. Some driving factors behind the release of Hadoop 3 are as follows:

  • A lot of bug fixes and performance improvements: Hadoop has a growing open source community of developers regularly adding major/minor changes or improvements...

Hadoop logical view

The Hadoop Logical view can be divided into multiple sections. These sections can be viewed as a logical sequence, with steps starting from Ingress/Egress and ending at Data Storage Medium.

The following diagram shows the Hadoop platform logical view:

We will touch upon these sections as shown in the preceding diagram one by one, to understand them. However, when designing any Hadoop application, you should think in terms of those sections and make technological choices according to the use case problems you are trying to solve. Without wasting time, let's look at these sections one by one:

  • Ingress/egress/processing: Any interaction with the Hadoop platform should be viewed in terms of the following:
    • Ingesting (ingress) data 
    • Reading (Egress) data 
    • Processing already ingested data

These actions can be automated via the use of...

Hadoop distributions

Hadoop is an open-source project under the Apache Software Foundation, and most components in the Hadoop ecosystem are also open-sourced. Many companies have taken important components and bundled them together to form a complete distribution package that is easier to use and manage. A Hadoop distribution offers the following benefits:

  • Installation: The distribution package provides an easy way to install any component or rpm-like package on clusters. It provides an easy interface too.
  • Packaging: It comes with multiple open-source tools that are well configured to work together. Assume that you want to install and configure each component separately on a multi-node cluster and then test whether it's working properly or not. What if we forget some testing scenarios and the cluster behaves unexpectedly? The Hadoop distribution assures us that...

Points to remember

We provided a basic introduction to Hadoop and the following are a few points to remember:

  • Doug Cutting, the founder of Hadoop, started the development of Hadoop at Nutch based on a Google research paper on Google File System and MapReduce.
  • Apache Lucene is a full-text open-source search library initially written by Doug Cutting in Java.
  • Hadoop consists of two important parts, one called the Hadoop Distributed File System and the other called MapReduce.
  • YARN is a resource management framework used to schedule and run applications such as MapReduce and Spark.
  • Hadoop distributions are a complete package of all open source big data tools integrated together to work with each other in an efficient way.

Summary

In this chapter, we covered Hadoop's origins and how Hadoop evolved over time with more performance-optimized features and tools. We also covered a logical view of the Hadoop platform in detail and understood its different layers. Hadoop distribution was also covered, to help you understand which distribution you should choose. We described the new features available in Hadoop version 3 and will discuss these in more detail in upcoming chapters.

In the next chapter, we will cover HDFS and will walk you through the HDFS architecture and its component in detail. We will go much deeper into the internals of HDFS and HDFS high availability features. We will then look into HDFS read-write operations and how the HDFS caching and federation service works.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Get to grips with the newly introduced features and capabilities of Hadoop 3
  • Crunch and process data using MapReduce, YARN, and a host of tools within the Hadoop ecosystem
  • Sharpen your Hadoop skills with real-world case studies and code

Description

Apache Hadoop is one of the most popular big data solutions for distributed storage and for processing large chunks of data. With Hadoop 3, Apache promises to provide a high-performance, more fault-tolerant, and highly efficient big data processing platform, with a focus on improved scalability and increased efficiency. With this guide, you’ll understand advanced concepts of the Hadoop ecosystem tool. You’ll learn how Hadoop works internally, study advanced concepts of different ecosystem tools, discover solutions to real-world use cases, and understand how to secure your cluster. It will then walk you through HDFS, YARN, MapReduce, and Hadoop 3 concepts. You’ll be able to address common challenges like using Kafka efficiently, designing low latency, reliable message delivery Kafka systems, and handling high data volumes. As you advance, you’ll discover how to address major challenges when building an enterprise-grade messaging system, and how to use different stream processing systems along with Kafka to fulfil your enterprise goals. By the end of this book, you’ll have a complete understanding of how components in the Hadoop ecosystem are effectively integrated to implement a fast and reliable data pipeline, and you’ll be equipped to tackle a range of real-world problems in data pipelines.

Who is this book for?

If you want to become a big data professional by mastering the advanced concepts of Hadoop, this book is for you. You’ll also find this book useful if you’re a Hadoop professional looking to strengthen your knowledge of the Hadoop ecosystem. Fundamental knowledge of the Java programming language and basics of Hadoop is necessary to get started with this book.

What you will learn

  • Gain an in-depth understanding of distributed computing using Hadoop 3
  • Develop enterprise-grade applications using Apache Spark, Flink, and more
  • Build scalable and high-performance Hadoop data pipelines with security, monitoring, and data governance
  • Explore batch data processing patterns and how to model data in Hadoop
  • Master best practices for enterprises using, or planning to use, Hadoop 3 as a data platform
  • Understand security aspects of Hadoop, including authorization and authentication

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Feb 28, 2019
Length: 544 pages
Edition : 1st
Language : English
ISBN-13 : 9781788628327
Vendor :
Apache
Category :
Languages :
Tools :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Feb 28, 2019
Length: 544 pages
Edition : 1st
Language : English
ISBN-13 : 9781788628327
Vendor :
Apache
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just ₱260 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just ₱260 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 7,042.97
Mastering Hadoop 3
₱3112.99
Big Data Analytics with Hadoop 3
₱2245.99
Apache Hadoop 3 Quick Start Guide
₱1683.99
Total 7,042.97 Stars icon
Banner background image

Table of Contents

20 Chapters
Section 1: Introduction to Hadoop 3 Chevron down icon Chevron up icon
Journey to Hadoop 3 Chevron down icon Chevron up icon
Deep Dive into the Hadoop Distributed File System Chevron down icon Chevron up icon
YARN Resource Management in Hadoop Chevron down icon Chevron up icon
Internals of MapReduce Chevron down icon Chevron up icon
Section 2: Hadoop Ecosystem Chevron down icon Chevron up icon
SQL on Hadoop Chevron down icon Chevron up icon
Real-Time Processing Engines Chevron down icon Chevron up icon
Widely Used Hadoop Ecosystem Components Chevron down icon Chevron up icon
Section 3: Hadoop in the Real World Chevron down icon Chevron up icon
Designing Applications in Hadoop Chevron down icon Chevron up icon
Real-Time Stream Processing in Hadoop Chevron down icon Chevron up icon
Machine Learning in Hadoop Chevron down icon Chevron up icon
Hadoop in the Cloud Chevron down icon Chevron up icon
Hadoop Cluster Profiling Chevron down icon Chevron up icon
Section 4: Securing Hadoop Chevron down icon Chevron up icon
Who Can Do What in Hadoop Chevron down icon Chevron up icon
Network and Data Security Chevron down icon Chevron up icon
Monitoring Hadoop Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Full star icon Full star icon 5
(1 Ratings)
5 star 100%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
SuchRave Mar 14, 2019
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Its a wonderful book, giving details about overall ecosystem very well. Kudos!
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.