Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Newsletter Hub

Free Learning

You're reading from Mastering Hadoop 3 Big data processing at scale to unlock unique business insights

Product type Paperback

Published in Feb 2019

Publisher Packt

ISBN-13 9781788620444

Length 544 pages

Edition 1st Edition

Languages

Java

Tools

Hadoop

Concepts

Big Data

Authors (3):

Timothy Wong

Manish Kumar

Chanchal Singh

View More author details

Table of Contents (21) Chapters

Preface

1. Section 1: Introduction to Hadoop 3 FREE CHAPTER

2. Journey to Hadoop 3

3. Deep Dive into the Hadoop Distributed File System

4. YARN Resource Management in Hadoop

5. Internals of MapReduce

6. Section 2: Hadoop Ecosystem

7. SQL on Hadoop

8. Real-Time Processing Engines

9. Widely Used Hadoop Ecosystem Components

10. Section 3: Hadoop in the Real World

11. Designing Applications in Hadoop

12. Real-Time Stream Processing in Hadoop

13. Machine Learning in Hadoop

14. Hadoop in the Cloud

15. Hadoop Cluster Profiling

16. Section 4: Securing Hadoop

17. Who Can Do What in Hadoop

18. Network and Data Security

19. Monitoring Hadoop

20. Other Books You May Enjoy

Leave a review - let other readers know what you think

Node labels

The use of Hadoop in the organization increases over time and they board more use cases to the Hadoop platform. The data pipeline in an organization consists of multiple jobs. A Spark job may need machines with more RAM and powerful processing capabilities but, on the other hand, MapReduce can run on less powerful machines. Therefore, it is obvious that a cluster may consist of different types of machines to save infrastructure costs. A Spark job may need machines with high processing capability.
YARN label is nothing but a marker for each machine so that machines with the same label name can be used for specific jobs. The nodes with more powerful processing capabilities can be labelled with the same name and then jobs that require more powerful machines can use the same node label during submission. Each node can only have one label assigned to it, which means...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (3)

Wong

Dr. Timothy Wong is a 30 years IT veteran. He holds a PhD in networking from University of Manchester Institute of Science and Technology, UK. He has deep experience in wireless and wireline networking, digital a/v, software development, sales and consulting. He frequently works with telcos, banks, governments and enterprises. He is also a professor on Big Data, Wireless and IoT at Humber College, Toronto, Canada. As an entrepreneur, he co-founded a number of companies in the past 20 years. He infuses business and technical knowledge to pursue successes.

See other products by Wong

Singh

Chanchal Singh has over half decades experience in Product Development and Architect Design. He has been working very closely with leadership team of various companies including directors ,CTO's and Founding members to define technical road-map for company.He is the Founder and Speaker at meetup group Big Data and AI Pune MeetupExperience Speaks. He is Co-Author of Book Building Data Streaming Application with Apache Kafka. He has a Bachelor's degree in Information Technology from the University of Mumbai and a Master's degree in Computer Application from Amity University. He was also part of the Entrepreneur Cell in IIT Mumbai. His Linkedin Profile can be found at with the username Chanchal Singh.

See other products by Singh

Kumar

Manish Kumar works as Director of Technology and Architecture at VSquare. He has over 13 years' experience in providing technology solutions to complex business problems. He has worked extensively on web application development, IoT, big data, cloud technologies, and blockchain. Aside from this book, Manish has co-authored three books (Mastering Hadoop 3, Artificial Intelligence for Big Data, and Building Streaming Applications with Apache Kafka).

See other products by Kumar