Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases now! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Learning YARN

You're reading from   Learning YARN Moving beyond MapReduce - learn resource management and big data processing using YARN

Arrow left icon
Product type Paperback
Published in Aug 2015
Publisher
ISBN-13 9781784393960
Length 278 pages
Edition 1st Edition
Tools
Arrow right icon
Toc

Table of Contents (14) Chapters Close

Preface 1. Starting with YARN Basics FREE CHAPTER 2. Setting up a Hadoop-YARN Cluster 3. Administering a Hadoop-YARN Cluster 4. Executing Applications Using YARN 5. Understanding YARN Life Cycle Management 6. Migrating from MRv1 to MRv2 7. Writing Your Own YARN Applications 8. Dive Deep into YARN Components 9. Exploring YARN REST Services 10. Scheduling YARN Applications 11. Enabling Security in YARN 12. Real-time Data Analytics Using YARN Index

Chapter 1. Starting with YARN Basics

In early 2006, Apache Hadoop was introduced as a framework for the distributed processing of large datasets stored across clusters of computers, using a programming model. Hadoop was developed as a solution to handle big data in a cost effective and easiest way possible. Hadoop consisted of a storage layer, that is, Hadoop Distributed File System (HDFS) and the MapReduce framework for managing resource utilization and job execution on a cluster. With the ability to deliver high performance parallel data analysis and to work with commodity hardware, Hadoop is used for big data analysis and batch processing of historical data through MapReduce programming.

With the exponential increase in the usage of social networking sites such as Facebook, Twitter, and LinkedIn and e-commerce sites such as Amazon, there was the need of a framework to support not only MapReduce batch processing, but real-time and interactive data analysis as well. Enterprises should be able to execute other applications over the cluster to ensure that cluster capabilities are utilized to the fullest. The data storage framework of Hadoop was able to counter the growing data size, but resource management became a bottleneck. The resource management framework for Hadoop needed a new design to solve the growing needs of big data.

YARN, an acronym for Yet Another Resource Negotiator, has been introduced as a second-generation resource management framework for Hadoop. YARN is added as a subproject of Apache Hadoop. With MapReduce focusing only on batch processing, YARN is designed to provide a generic processing platform for data stored across a cluster and a robust cluster resource management framework.

In this chapter, we will cover the following topics:

  • Introduction to MapReduce v1
  • Shortcomings of MapReduce v1
  • An overview of the YARN components
  • The YARN architecture
  • How YARN satisfies big data needs
  • Projects powered by YARN
You have been reading a chapter from
Learning YARN
Published in: Aug 2015
Publisher:
ISBN-13: 9781784393960
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime