Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Apache Hadoop 3 Quick Start Guide

You're reading from   Apache Hadoop 3 Quick Start Guide Learn about big data processing and analytics

Arrow left icon
Product type Paperback
Published in Oct 2018
Publisher Packt
ISBN-13 9781788999830
Length 220 pages
Edition 1st Edition
Languages
Tools
Concepts
Arrow right icon
Author (1):
Arrow left icon
Hrishikesh Vijay Karambelkar Hrishikesh Vijay Karambelkar
Author Profile Icon Hrishikesh Vijay Karambelkar
Hrishikesh Vijay Karambelkar
Arrow right icon
View More author details
Toc

Table of Contents (10) Chapters Close

Preface 1. Hadoop 3.0 - Background and Introduction FREE CHAPTER 2. Planning and Setting Up Hadoop Clusters 3. Deep Dive into the Hadoop Distributed File System 4. Developing MapReduce Applications 5. Building Rich YARN Applications 6. Monitoring and Administration of a Hadoop Cluster 7. Demystifying Hadoop Ecosystem Components 8. Advanced Topics in Apache Hadoop 9. Other Books You May Enjoy

Hadoop 3.0 - Background and Introduction

"There were 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every two days."
– Eric Schmidt of Google, 2010

The world is evolving day by day, from automated call assistance to smart devices taking intelligent decisions, to self-driven decision-making cars to humanoid robots, all driven by processing large amount of data and analyzing it. We are rapidly approaching to the new era of data age. The IDC whitepaper (https://www.seagate.com/www-content/our-story/trends/files/Seagate-WP-DataAge2025-March-2017.pdf) on data evolution published in 2017 predicts data volumes to reach 163 zettabytes (1 zettabyte = 1 trillion terabytes) by the year 2025. This will involve digitization of all the analog data that we see between now and then. This flood of data will come from a broad variety of different device types, including IoT devices (sensor data) from industrial plants as well as home devices, smart meters, social media, wearables, mobile phones, and so on.

In our day-to-day life, we have seen ourselves participating in this evolution. For example, I started using a mobile phone in 2000 and, at that time, it had basic functions such as calls, torch, radio, and SMS. My phone could barely generate any data as such. Today, I use a 4G LTE smartphone capable of transmitting GBs of data including my photos, navigation history, and my health parameters from my smartwatch, on different devices over the internet. This data is effectively being utilized to make smart decisions.

Let's look at some real-world examples of big data:

  • Companies such as Facebook and Instagram are using face recognition tools to identify photos, classify them, and bring you friend suggestions by comparison
  • Companies such as Google and Amazon are looking at human behavior based on navigation patterns and location data, providing automated recommendations for shopping
  • Many government organizations are analyzing information from CCTV cameras, social media feeds, network traffic, phone data, and bookings to trace criminals and predict potential threats and terrorist attacks
  • Companies are using sentiment analysis from message posts and tweets to improve the quality of their products, as well as brand equities, and have targeted business growth
  • Every minute, we send 204 million emails, view 20 million photos on Flickr, perform 2 million searches on Google, and generate 1.8 million likes on Facebook (Source)

With this data growth, the demands to process, store, and analyze data in a faster and scalable manner will arise. So, the question is: are we ready to accommodate these demands? Year after year, computer systems have evolved and so has storage media in terms of capacities; however, the capability to read-write byte data is yet to catch up with these demands. Similarly, data coming from various sources and various forms needs to be correlated together to make meaningful information. For example, with a combination of my mobile phone location information, billing information, and credit card details, someone can derive my interests in food, social status, and financial strength. The good part is that we see a lot of potential of working with big data. Today, companies are barely scratching the surface; however, we are still struggling to deal with storage and processing problems unfortunately.

This chapter is intended to provide the necessary background for you to get started on Apache Hadoop. It will cover the following key topics:

  • How it all started
  • What Apache Hadoop is and why it is important
  • How Apache Hadoop works
  • Hadoop 3.0 releases and new features
  • Choosing the right Hadoop distribution
You have been reading a chapter from
Apache Hadoop 3 Quick Start Guide
Published in: Oct 2018
Publisher: Packt
ISBN-13: 9781788999830
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image