Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Modern Big Data Processing with Hadoop

You're reading from   Modern Big Data Processing with Hadoop Expert techniques for architecting end-to-end big data solutions to get valuable insights

Arrow left icon
Product type Paperback
Published in Mar 2018
Publisher Packt
ISBN-13 9781787122765
Length 394 pages
Edition 1st Edition
Languages
Concepts
Arrow right icon
Authors (3):
Arrow left icon
Manoj R Patil Manoj R Patil
Author Profile Icon Manoj R Patil
Manoj R Patil
Prashant Shindgikar Prashant Shindgikar
Author Profile Icon Prashant Shindgikar
Prashant Shindgikar
V Naresh Kumar V Naresh Kumar
Author Profile Icon V Naresh Kumar
V Naresh Kumar
Arrow right icon
View More author details
Toc

Table of Contents (12) Chapters Close

Preface 1. Enterprise Data Architecture Principles FREE CHAPTER 2. Hadoop Life Cycle Management 3. Hadoop Design Consideration 4. Data Movement Techniques 5. Data Modeling in Hadoop 6. Designing Real-Time Streaming Data Pipelines 7. Large-Scale Data Processing Frameworks 8. Building Enterprise Search Platform 9. Designing Data Visualization Solutions 10. Developing Applications Using the Cloud 11. Production Hadoop Cluster Deployment

The importance of metadata

Before we try to understand the importance of Metadata, let's try to understand what metadata is. Metadata is simply data about data. This sounds confusing as we are defining the definition in a recursive way.

In a typical big data system, we have these three levels of verticals:

  • Applications writing data to a big data system
  • Organizing data within the big data system
  • Applications consuming data from the big data system

This brings up a few challenges as we are talking about millions (even billions) of data files/segments that are stored in the big data system. We should be able to correctly identify the ownership, usage of these data files across the Enterprise.

Let's take an example of a TV broadcasting company that owns a TV channel; it creates television shows and broadcasts it to all the target audience over wired cable networks, satellite networks, the internet, and so on. If we look carefully, the source of content is only one. But it's traveling through all possible mediums and finally reaching the user’s Location for viewing on TV, mobile phone, tablets, and so on.

Since the viewers are accessing this TV content on a variety of devices, the applications running on these devices can generate several messages to indicate various user actions and preferences, and send them back to the application server. This data is pretty huge and is stored in a big data system.

Depending on how the data is organized within the big data system, it's almost impossible for outside applications or peer applications to know about the different types of data being stored within the system. In order to make this process easier, we need to describe and define how data organization takes place within the big data system. This will help us better understand the data organization and access within the big data system.

Let's extend this example even further and say there is another application that reads from the big data system to understand the best times to advertise in a given TV series. This application should have a better understanding of all other data that is available within the big data system. So, without having a well-defined metadata system, it's very difficult to do the following things:

  • Understand the diversity of data that is stored, accessed, and processed
  • Build interfaces across different types of datasets
  • Correctly tag the data from a security perspective as highly sensitive or insensitive data
  • Connect the dots between the given sets of systems in the big data ecosystem
  • Audit and troubleshoot issues that might arise because of data inconsistency
You have been reading a chapter from
Modern Big Data Processing with Hadoop
Published in: Mar 2018
Publisher: Packt
ISBN-13: 9781787122765
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image