Comprehensive Concepts of a Data Lake

The concept of a Data Lake in an enterprise was driven by certain challenges that enterprises were facing with the way the data was handled, processed and stored. Initially, all the individual applications in the enterprise, via a natural evolution cycle, started maintaining huge amounts of data themselves with almost no reuse in other applications in the same enterprise. These created information silos across various applications. As the next step of evolution, these individual applications started exposing this data across the organization as a data mart access layer over the central data warehouse. While Data Mart solved one part of the problem, other problems still persisted. These problems were more about data governance, data ownership and data accessibility, which were required to be resolved so as to have better availability of enterprise relevant data. This is where...

Filter reviews by

All

Amazon verified reviews

Sherihan Sheriff Dec 12, 2017

An excellent guide for both beginners and seasoned professionals that gives a practical insight on building a data lake using Big data technologies. Looking forward to more similar work from the authors in future.

Amazon Verified review

aussiejim Sep 30, 2018

I like the diagrams that simplified the various conceptsall in all I found this a useful resource

Anonymous Jun 04, 2019

I am writing a detailed review in hopes that it will help others decide if this book is right for them. More importantly, I hope that the author will see these comments and correct some of the current issues in the next version.I was looking for a book to increase my knowledge of data lake implementation patterns, with technical details on batch vs real time processing, data storage, and data processing strategies. I liked the outline and approach the author chose to discuss these topics (refer to TOC), and it did contain some useful information that I was able to apply to my situation. For anyone using the Apache tools it describes several of the major technologies and when to and when not to use them. I was able to apply these to other technologies as well.The problem is that the book is full of bad grammar, misspelled words (e.g., “willn’t”), wordy/repetitive sentences (see example below), and sections where the pictures don’t match the accompanying text (e.g., the author refers to colors on a B/W picture). I give the book 3 ½ stars out of 5 in its current state. It would be a 4 if they had a tech writer proof read it, fix the grammar issues and rewrite some of the sentences to be easier to understand. With a second pass to fix consistency issues, it would be a solid 5.Detailed example of wordy/repetitive sentences…(Coped from Chapter 8 – Data Processing using Apache Flink)The technology that we have shortlisted to do this very important job of processing data is Apache Flink. I have to say that this selection was quite difficult as we have another technology in mind, namely Apache Spark, which was really strong in this area and more matured. But we decided to go with Flink in the end considering its pros. However, we have also detailed Spark a bit as opposed to other chapters in which we have just named other options and left it, because of its significance in this space.(2 pages later)For covering our use case and to build Data Lake we use Apache Flink in this layer as the technology. Other strong technology choices namely Apache Spark will also be explained a bit as we do feel that this is an equally good choice, in this layer. This chapter dives deep into Flink, though.(next page)The technology choice in this layer was really tough for us. Apache Spark was initially our choice, but Apache Flink had something in it that made us think over and at the time of writing this book, the industry did have some pointers favoring Flink and this made us do the final choice as Flink. However, we could have implemented this layer using Spark and it would have worked well for sure.After 50 pages of Flink related discussions, there is ½ page high level overview of Apache Spark.

Dimitri Shvorob Apr 04, 2018

If it is "data lake" that piqued your interest, don't bother - as far as I can tell, it is just the current buzzword for company's data estate. "Data Lake for Enterprises" is a big-data book, starting with a discussion of Nathan Marz's "lambda architecture" and continuing with a tour of a set of big-data technologies which could be used to flesh out elements of that architecture. The Manning-published "Big Data" by Marz and Warren immediately suggests itself as an alternative, and I am sure that others exists - it's too bad that the earlier reviews mention none. Unfortunately, I am not a big-data guy, and cannot offer competent advice. I can say that (a) Stephen Yegge's complaints are overblown - as could be expected from Packt, the book is sloppily written and never proof-read, but it is not difficult to understand, (b) when skimmed, the book has made a decent impression.

VG Dec 22, 2017

Expected a lot more. It is lots of small bits and pieces of information trying to touch too many topics, mixing concepts (Very little of it) and implementation products (more of it).

Data Lake	Data Warehouse
Captures all types of data and structures, semi-structured and unstructured in their most natural form from source systems	Captures structured information and processes it as it is acquired into a fixed model defined for data warehouse purposes
Possesses enough processing power to process and analyze all kinds of data and have it analyzed for access	Processes structured data into a dimensional or reporting model for advanced reporting and analytics
A Data Lake usually contains more relevant information that has good probability of access and can provide operational needs for an enterprise	A Data Warehouse usually stores and retains...

Data Lake for Enterprises: Lambda Architecture for building enterprise data systems

What do you get with a Packt Subscription?

Data Lake for Enterprises

Comprehensive Concepts of a Data Lake

What is a Data Lake?

Relevance to enterprises

How does a Data Lake help enterprises?

How Data Lake works?

Differences between Data Lake and Data Warehouse

What is a Data Lake?

Note

Relevance to enterprises

How does a Data Lake help enterprises?

How Data Lake works?

Page 1 of 8

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with a Packt Subscription?

Product Details

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the 3 authors

FAQs

Data Lake for Enterprises: Lambda Architecture for building enterprise data systems

What do you get with a Packt Subscription?

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with a Packt Subscription?

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

About the 3 authors

FAQs