Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Data Lake Development with Big Data

You're reading from   Data Lake Development with Big Data Explore architectural approaches to building Data Lakes that ingest, index, manage, and analyze massive amounts of data using Big Data technologies

Arrow left icon
Product type Paperback
Published in Nov 2015
Publisher
ISBN-13 9781785888083
Length 164 pages
Edition 1st Edition
Languages
Tools
Concepts
Arrow right icon
Toc

Preface

The book Data Lake Development with Big Data is a practical guide to help you learn the essential architectural approaches to design and build Data Lakes. It walks you through the various components of Data Lakes, such as data intake, management, consumption, and governance with a specific focus on practical implementation scenarios.

Data Lake is a highly scalable data platform for better search, analytical processing, and cheaper storage of huge volumes of any structured data acquired from disparate sources.

Traditional Data Management systems are constrained by data silos, upfront data modeling, rigid data structures, and schema-based write approaches while storing and processing data. This hampers the holistic analysis of data residing in multiple silos and excludes unstructured data sources from analysis. The data is generally modeled to answer known business questions.

With Data Lake, there are no more data silos; all the data can be utilized to get a coherent view that can power a new generation of data-aware analytics applications. With Data Lake, you don't have to know all the business questions in advance, as the data can be modeled later using the schema-less approach and it is possible to ask complex far-reaching questions on all the data at any time to find out hidden patterns and complex relationships in the data.

After reading this book, you will be able to address the shortcoming of traditional data systems through the best practices highlighted in this book for building Data Lake. You will understand the complete lifecycle of architecting/building Data Lake with Big Data technologies such as Hadoop, Storm, Spark, and Splunk. You will gain a comprehensive knowledge of various stages in Data Lake such as data intake, data management, and data consumption with focus on the practical use cases at each stage. You will benefit from the book's detailed coverage of data governance, data security, data lineage tracking, metadata management, data provisioning, and consumption.

As Data Lake is such an advanced complex topic, we are honored and excited to author the first book of its kind in the world. However, at the same time, as the topic being so vast and as there is no one-size-fits-all kind of Data Lake architecture, it is very challenging to appeal to a wide audience footprint. As it is a mini series book, which limits the page count, it is extremely difficult to cover every topic in detail without breaking the ceiling. Given these constraints, we have taken a reader-centric approach in writing this book because the broader understanding of the overall concept of Data Lake is far more important than the in-depth understanding of all the technologies and architectural possibilities that go into building Data Lake.

Using this guiding principle, we refrained from the in-depth coverage of any single topic, because we could not possibly do justice to it. At the same time we made efforts to organize chapters to mimick the sequential flow of data in a typical organization so that it is intuitive for the reader to quickly grasp the concepts of Data Lake from an organizational data flow perspective. In order to make the abstract concepts relatable to the real world, we have followed a use case-based approach where practical implementation scenarios of each key Data Lake component are explained. This we believe will help the reader quickly understand the architectural implications of various Big Data technologies that are used for building these components.

lock icon The rest of the chapter is locked
Next Section arrow right
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image