Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Modern Big Data Processing with Hadoop

You're reading from   Modern Big Data Processing with Hadoop Expert techniques for architecting end-to-end big data solutions to get valuable insights

Arrow left icon
Product type Paperback
Published in Mar 2018
Publisher Packt
ISBN-13 9781787122765
Length 394 pages
Edition 1st Edition
Languages
Concepts
Arrow right icon
Authors (3):
Arrow left icon
Manoj R Patil Manoj R Patil
Author Profile Icon Manoj R Patil
Manoj R Patil
Prashant Shindgikar Prashant Shindgikar
Author Profile Icon Prashant Shindgikar
Prashant Shindgikar
V Naresh Kumar V Naresh Kumar
Author Profile Icon V Naresh Kumar
V Naresh Kumar
Arrow right icon
View More author details
Toc

Table of Contents (12) Chapters Close

Preface 1. Enterprise Data Architecture Principles 2. Hadoop Life Cycle Management FREE CHAPTER 3. Hadoop Design Consideration 4. Data Movement Techniques 5. Data Modeling in Hadoop 6. Designing Real-Time Streaming Data Pipelines 7. Large-Scale Data Processing Frameworks 8. Building Enterprise Search Platform 9. Designing Data Visualization Solutions 10. Developing Applications Using the Cloud 11. Production Hadoop Cluster Deployment

Data as a Service

Data as a Service (DaaS) is a concept that has become popular in recent times due to the increase in adoption of cloud. When it comes to data. It might some a little confusing that how can data be added to as a service model?

DaaS offers great flexibility to users of the service in terms of not worrying about the scale, performance, and maintenance of the underlying infrastructure that the service is being run on. The infrastructure automatically takes care of it for us, but given that we are dealing with a cloud model, we have all the benefits of the cloud such as pay as you go, capacity planning, and so on. This will reduce the burden of data management.

If we try to understand this carefully we are taking out the data management part alone. But data governance should be well-defined here as well or else we will lose all the benefits of the service model.

So far, we are talking about the Service in the cloud concept. Does it mean that we cannot use this within the Enterprise or even smaller organizations? The answer is No. Because this is a generic concept that tells us the following things.

When we are talking about a service model, we should keep in mind the following things, or else chaos will ensue:

  • Authentication
  • Authorization
  • Auditing

This will guarantee that only well-defined users, IP addresses, and services can access the data exposed as a service.

Let's take an example of an organization that has the following data:

  • Employees
  • Servers and data centers
  • Applications
  • Intranet documentation sites

As you can see, all these are independent datasets. But, as a whole when we want the organization to succeed. There is lot of overlap and we should try to embrace the DaaS model here so that all these applications that are authoritative for the data will still manage the data. But for other applications, they are exposed as a simple service using REST API; therefore, this increases collaboration and fosters innovation within the organization.

Let's take further examples of how this is possible:

  • The team that manages all the employee data in the form of a database can provide a simple Data Service. All other applications can use this dataset without worrying about the underlying infrastructure on which this employee data is stored:
    • This will free the consumers of the data services in such a way that the consumers:
      • Need not worry about the underlying infrastructure
      • Need not worry about the protocols that are used to communicate with these data servers
      • Can just focus on the REST model to design the application
    • Typical examples of this would be:
      • Storing the employee data in a database like LDAP or the Microsoft Active directory
  • The team that manages the infrastructure for the entire organization can design their own system to keep off the entire hardware inventory of the organization, and can provide a simple data service. The rest of the organization can use this to build applications that are of interest to them:
    • This will make the Enterprise more agile
    • It ensures there is a single source of truth for the data about the entire hardware of the organization
    • It improves trust in the data and increases confidence in the applications that are built on top of this data
  • Every team in the organization might use different technology to build and deploy their applications on to the servers. Following this, they also need to build a data store that keeps track of the active versions of software that are deployed on the servers. Having a data source like this helps the organization in the following ways:
    • Services that are built using this data can constantly monitor and see where the software deployments are happening more often
    • The services can also figure out which applications are vulnerable and are actively deployed in production so that further action can be taken to fix the loopholes, either by upgrading the OS or the software
    • Understanding the challenges in the overall software deployment life cycle
    • Provides a single platform for the entire organization to do things in a standard way, which promotes a sense of ownership
  • Documentation is one of the very important things for an organization. Instead of running their own infrastructure, with the DaaS model, organizations and teams can focus on the documents that are related to their company and pay only for those. Here, services such as Google Docs and Microsoft Office Online are very popular as they give us flexibility to pay as we go and, most importantly, not worry about the technology required to build these.
    • Having such a service model for data will help us do the following:
      • Pay only for the service that is used
      • Increase or decrease the scale of storage as needed
      • Access the data from anywhere if the service is on the Cloud and connected to the internet
      • Access corporate resources when connected via VPN as decided by the Enterprise policy

In the preceding examples, we have seen a variety of applications that are used in Enterprises and how data as a model can help Enterprises in variety of ways to bring collaboration, innovation, and trust.

But, when it comes to big data, what can DaaS Do?

Just like all other data pieces, big data can also be fit into a DaaS model and provides the same flexibility as we saw previously:

  • No worry about the underlying hardware and technology
  • Scale the infrastructure as needed
  • Pay only for the data that is owned by the Enterprise
  • Operational and maintenance challenges are taken away
  • Data can be made geographically available for high availability
  • Integrated backup and recovery for DR requirements

With these few advantages, enterprises can be more agile and build applications that can leverage this data as service.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime