Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Mastering Hadoop 3 Big data processing at scale to unlock unique business insights

Product type Paperback

Published in Feb 2019

Publisher Packt

ISBN-13 9781788620444

Length 544 pages

Edition 1st Edition

Languages

Java

Tools

Hadoop

Concepts

Big Data

Authors (3):

Timothy Wong

Manish Kumar

Chanchal Singh

View More author details

Table of Contents (21) Chapters

Preface

1. Section 1: Introduction to Hadoop 3 FREE CHAPTER

2. Journey to Hadoop 3

3. Deep Dive into the Hadoop Distributed File System

4. YARN Resource Management in Hadoop

5. Internals of MapReduce

6. Section 2: Hadoop Ecosystem

7. SQL on Hadoop

8. Real-Time Processing Engines

9. Widely Used Hadoop Ecosystem Components

10. Section 3: Hadoop in the Real World

11. Designing Applications in Hadoop

12. Real-Time Stream Processing in Hadoop

13. Machine Learning in Hadoop

14. Hadoop in the Cloud

15. Hadoop Cluster Profiling

16. Section 4: Securing Hadoop

17. Who Can Do What in Hadoop

18. Network and Data Security

19. Monitoring Hadoop

20. Other Books You May Enjoy

Leave a review - let other readers know what you think

SQL on Hadoop

Hadoop is traditionally used as a File System with the capability to process high data volumes using distributed algorithms. However, with its growing popularity among non-programmers and business analysts, there is a need to read and manipulate high volume records using simple, well-known interfaces. SQL is always popular among non-programmers and data analysts because of its simple constructs and easy-to-understand logical syntax. Since Hadoop is used as storage for large volumes of data and because data exploration on top of Hadoop is one of the key use cases, SQL is ideal. Keeping those goals in mind, many SQL engines are developed to process and explore data stored in the Hadoop File System. There are many SQL distributions on Hadoop. Most of them are open source. We will look into those one by one in the following sections.

In this chapter, we will cover the...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (3)

Singh

Contacted on 12/01/18 by Davis Anto

See other products by Singh

Wong

Dr. Timothy Wong is a 30 years IT veteran. He holds a PhD in networking from University of Manchester Institute of Science and Technology, UK. He has deep experience in wireless and wireline networking, digital a/v, software development, sales and consulting. He frequently works with telcos, banks, governments and enterprises. He is also a professor on Big Data, Wireless and IoT at Humber College, Toronto, Canada. As an entrepreneur, he co-founded a number of companies in the past 20 years. He infuses business and technical knowledge to pursue successes.

See other products by Wong

Kumar

Ashish Kumar is a seasoned data science professional, a publisher author and a thought leader in the field of data science and machine learning. An IIT Madras graduate and a Young India Fellow, he has around 7 years of experience in implementing and deploying data science and machine learning solutions for challenging industry problems in both hands-on and leadership roles. Natural Language Procession, IoT Analytics, R Shiny product development, Ensemble ML methods etc. are his core areas of expertise. He is fluent in Python and R and teaches a popular ML course at Simplilearn. When not crunching data, Ashish sneaks off to the next hip beach around and enjoys the company of his Kindle. He also trains and mentors data science aspirants and fledgling start-ups.

See other products by Kumar