Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Data Lake for Enterprises

You're reading from   Data Lake for Enterprises Lambda Architecture for building enterprise data systems

Arrow left icon
Product type Paperback
Published in May 2017
Publisher Packt
ISBN-13 9781787281349
Length 596 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Authors (3):
Arrow left icon
Pankaj Misra Pankaj Misra
Author Profile Icon Pankaj Misra
Pankaj Misra
Tomcy John Tomcy John
Author Profile Icon Tomcy John
Tomcy John
Vivek Mishra Vivek Mishra
Author Profile Icon Vivek Mishra
Vivek Mishra
Arrow right icon
View More author details
Toc

Table of Contents (13) Chapters Close

Preface 1. Introduction to Data FREE CHAPTER 2. Comprehensive Concepts of a Data Lake 3. Lambda Architecture as a Pattern for Data Lake 4. Applied Lambda for Data Lake 5. Data Acquisition of Batch Data using Apache Sqoop 6. Data Acquisition of Stream Data using Apache Flume 7. Messaging Layer using Apache Kafka 8. Data Processing using Apache Flink 9. Data Store Using Apache Hadoop 10. Indexed Data Store using Elasticsearch 11. Data Lake Components Working Together 12. Data Lake Use Case Suggestions

Enterprise's current state

As explained briefly in the previous sections, the current state of enterprise data in an organization can be summarized in bullets points as follows:

  • Conventional DW (Data Warehouse) /BI (Business Intelligence):
    • Refined/ cleansed data transferred from production business application using ETL.
    • Data earlier than a certain period would have already been transferred to a storage, which is hard to retrieve, such as magnetic tape storage.
    • Some of its notable deficiencies are as follows:
      • A subset of production data in a cleansed format exists in DW; for any new element in DW, effort has to be made
      • A subset of the data is again in DW, and the rest gets transferred to permanent storage
      • Usually, analysis is really slow, and it is optimized again to perform queries, which are, to an extent, defined
  • Siloed Big Data:
    • Some departments would have taken the right step in building big data. But departments generally don’t collaborate with each other, and this big data becomes siloed and doesn't give the value of a true big data for the enterprise.
    • Some of its deficiencies are as follows:
      • Because of its siloed nature, the analyst is again constrained and not able to mix and match data between departments.
      • A good amount of money would have been spent to build and maintain/manage this and usually over a period of time is not sustainable.
  • Myriad of non-connected applications:
    • There is a good amount of applications on premises and on cloud.
    • Applications apart from churning structured data also produce unstructured data.
    • Some of the deficiencies are as follows:
      • Don't talk to each other
      • Even if it talks, data scientists are not able to use it in an effective way to transform the enterprise in a meaningful way
      • Replication of technology usage for handling many aspects in each business application

We wouldn't say that creating or investing in Data lake is a silver bullet to solve all the aforementioned deficiencies. But it is definitely a step in the right direction, and every enterprise should at least spend some time discussing whether this is indeed required, and if it is a yes, don't deliberate over it too much and take the next step in the path of implementation.

Data lake is an enterprise initiative, and when built, it has to be with the consent of all the stakeholders, and it should have buy-ins from the top executives. It can definitely find ways to improve processes by which enterprises do business. It can help the higher management know more about their business and can increase the success rate of the decision-making process.

You have been reading a chapter from
Data Lake for Enterprises
Published in: May 2017
Publisher: Packt
ISBN-13: 9781787281349
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image