Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Hadoop Essentials

You're reading from   Hadoop Essentials Delve into the key concepts of Hadoop and get a thorough understanding of the Hadoop ecosystem

Arrow left icon
Product type Paperback
Published in Apr 2015
Publisher Packt
ISBN-13 9781784396688
Length 194 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Shiva Achari Shiva Achari
Author Profile Icon Shiva Achari
Shiva Achari
Arrow right icon
View More author details
Toc

Table of Contents (9) Chapters Close

Preface 1. Introduction to Big Data and Hadoop FREE CHAPTER 2. Hadoop Ecosystem 3. Pillars of Hadoop – HDFS, MapReduce, and YARN 4. Data Access Components – Hive and Pig 5. Storage Component – HBase 6. Data Ingestion in Hadoop – Sqoop and Flume 7. Streaming and Real-time Analysis – Storm and Spark Index

Big data use case patterns

There are many technological scenarios, and some of them are similar in pattern. It is a good idea to map scenarios with architectural patterns. Once these patterns, are understood, they become the fundamental building blocks of solutions. We will discuss five types of patterns in the following section.

Note

This solution is not always optimized, and it may depend on domain data, type of data, or some other factors. These examples are to visualize a problem and they can help to find a solution.

Big data as a storage pattern

Big data systems can be used as a storage pattern or as a data warehouse, where data from multiple sources, even with different types of data, can be stored and can be utilized later. The usage scenario and use case are as follows:

  • Usage scenario:
    • Data getting continuously generated in large volumes
    • Need for preprocessing before getting loaded into the target system
  • Use case:
    • Machine data capture for subsequent cleansing can be merged in multiple or single big file(s) and can be loaded in a Hadoop to compute
    • Unstructured data across multiple sources should be captured for subsequent analysis on emerging patterns
    • Data loaded in Hadoop should be processed and filtered, and depending on the data, we can have the storage as a data warehouse, Hadoop, or any NoSQL system.

The storage pattern is shown in the following figure:

Big data as a storage pattern

Big data as a data transformation pattern

Big data systems can be designed to perform transformation as the data loading and cleansing activity, and many transformations can be done faster than traditional systems due to parallelism. Transformation is one phase in the Extract–Transform–Load of data ingestion and cleansing phase. The usage scenario and use case are as follows:

  • Usage scenario
    • A large volume of raw data to be preprocessed
    • Data type includes structured as well as non-structured data
  • Use case
    • Evolution of ETL (Extract–Transform–Load) tools to leverage big data, for example, Pentaho, Talend, and so on. Also, in Hadoop, ELT (Extract–Load–Transform) is also trending, as the loading will be faster in Hadoop, and cleansing can run a parallel process to clean and transform the input, which will be faster

The data transformation pattern is shown in the following figure:

Big data as a data transformation pattern

Big data for a data analysis pattern

Data analytics is of wider interest in big data systems, where a huge amount of data can be analyzed to generate statistical reports and insights about the data, which can be useful in business and understanding of patterns. The usage scenario and use case are as follows:

  • Usage scenario
    • Improved response time for detection of patterns
    • Data analysis for non-structured data
  • Use case
    • Fast turnaround for machine data analysis (for example, analysis of seismic data)
    • Pattern detection across structured and non-structured data (for example, fraud analysis)

Big data for data in a real-time pattern

Big data systems integrating with some streaming libraries and systems are capable of handling high scale real-time data processing. Real-time processing for a large and complex requirement possesses a lot of challenges such as performance, scalability, availability, resource management, low latency, and so on. Some streaming technologies such as Storm and Spark Streaming can be integrated with YARN. The usage scenario and use case are as follows:

  • Usage scenario
    • Managing the action to be taken based on continuously changing data in real time
  • Use case
    • Automated process control based on real time from manufacturing equipments
    • Real-time changes to plant operations based on events from business systems Enterprise Resource Planning (ERPs)

The data in a real-time pattern is shown in the following figure:

Big data for data in a real-time pattern

Big data for a low latency caching pattern

Big data systems can be tuned as a special case for a low latency system, where reads are much higher and updates are low, which can fetch the data faster and can be stored in memory, which can further improve the performance and avoid overheads. The usage scenario and use case are as follows:

  • Usage scenario
    • Reads are far higher in ratio to writes
    • Reads require very low latency and a guaranteed response
    • Distributed location-based data caching
  • Use case
    • Order promising solutions
    • Cloud-based identity and SSO
    • Low latency real-time personalized offers on mobile

The low latency caching pattern is shown in the following pattern:

Big data for a low latency caching pattern

Some of the technology stacks that are widely used according to the layer and framework are shown in the following image:

Big data for a low latency caching pattern
You have been reading a chapter from
Hadoop Essentials
Published in: Apr 2015
Publisher: Packt
ISBN-13: 9781784396688
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image