You're reading from Hadoop Real-World Solutions Cookbook- Second Edition Over 90 hands-on recipes to help you learn and master the intricacies of Apache Hadoop 2.X, YARN, Hive, Pig, Oozie, Flume, Sqoop, Apache Spark, and Mahout

Product type Paperback

Published in Mar 2016

Publisher

ISBN-13 9781784395506

Length 290 pages

Edition 2nd Edition

Tools

Hadoop

Concepts

Data Processing

Author (1):

Tanmay Deshpande

View More author details

Table of Contents (12) Chapters

Preface

1. Getting Started with Hadoop 2.X FREE CHAPTER

2. Exploring HDFS

3. Mastering Map Reduce Programs

4. Data Analysis Using Hive, Pig, and Hbase

5. Advanced Data Analysis Using Hive

6. Data Import/Export Using Sqoop and Flume

7. Automation of Hadoop Tasks Using Oozie

8. Machine Learning and Predictive Analytics Using Mahout and R

9. Integration with Apache Spark

10. Hadoop Use Cases

Index

Introduction

Hadoop has been the primary platform for many people who deal with big data problems. It is the heart of big data. Hadoop was developed way back between 2003 and 2004 when Google published research papers on Google File System (GFS) and Map Reduce. Hadoop was structured around the crux of these research papers, and thus derived its shape. With the advancement of the Internet and social media, people slowly started realizing the power that Hadoop had, and it soon became the top platform used to handle big data. With a lot of hard work from dedicated contributors and open source groups to the project, Hadoop 1.0 was released and the IT industry welcomed it with open arms.

A lot of companies started using Hadoop as the primary platform for their Data Warehousing and Extract-Transform-Load (ETL) needs. They started deploying thousands of nodes in a Hadoop cluster and realized that there were scalability issues beyond the 4000+ node clusters that were already present. This was because JobTracker was not able to handle that many Task Trackers, and there was also the need for high availability in order to make sure that clusters were reliable to use. This gave birth to Hadoop 2.0.

In this introductory chapter, we are going to learn interesting recipes such as installing a single/multi-node Hadoop 2.0 cluster, its benchmarking, adding new nodes to existing clusters, and so on. So, let's get started.

You're reading from Hadoop Real-World Solutions Cookbook- Second Edition Over 90 hands-on recipes to help you learn and master the intricacies of Apache Hadoop 2.X, YARN, Hive, Pig, Oozie, Flume, Sqoop, Apache Spark, and Mahout

Table of Contents (12) Chapters

Introduction

Authors (1)

Personalised recommendations for you