Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Free Learning

You're reading from Data Lake for Enterprises Lambda Architecture for building enterprise data systems

Product type Paperback

Published in May 2017

Publisher Packt

ISBN-13 9781787281349

Length 596 pages

Edition 1st Edition

Languages

Java

Tools

Hadoop

Concepts

Data Processing

Authors (3):

Pankaj Misra

Tomcy John

Vivek Mishra

View More author details

Table of Contents (13) Chapters

Preface

1. Introduction to Data FREE CHAPTER

2. Comprehensive Concepts of a Data Lake

3. Lambda Architecture as a Pattern for Data Lake

4. Applied Lambda for Data Lake

5. Data Acquisition of Batch Data using Apache Sqoop

6. Data Acquisition of Stream Data using Apache Flume

7. Messaging Layer using Apache Kafka

8. Data Processing using Apache Flink

9. Data Store Using Apache Hadoop

10. Indexed Data Store using Elasticsearch

11. Data Lake Components Working Together

12. Data Lake Use Case Suggestions

Elasticsearch as a data source

In general, Elasticsearch shouldn’t (subjective, yes we do acknowledge this) be used as a primary data store. However, this question is more use case-driven and for some use cases it could very well be used as a data store. Elasticsearch does fall into the NoSQL type of database and doesn't support the ACID property of a typical relational data store, mostly used for transaction-oriented use cases. But it does have features such as optimistic locking and eventual consistency making it apt for certain pointed use cases. For a data lake implementation, it could very well act as a data store because the real data store (system of record) is with the source systems. In the case of any failure, the data could very well be warmed into Elasticsearch (in practical scenarios this is not that straight forward.. smiley) from these source system or even from our Hadoop and back to...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (3)

John

Tomcy John lives in Dubai (United Arab Emirates), hailing from Kerala (India), and is an enterprise Java specialist with a degree in Engineering (B Tech) and over 14 years of experience in several industries. He's currently working as principal architect at Emirates Group IT, in their core architecture team. Prior to this, he worked with Oracle Corporation and Ernst & Young. His main specialization is in building enterprise-grade applications and he acts as chief mentor and evangelist to facilitate incorporating new technologies as corporate standards in the organization. Outside of his work, Tomcy works very closely with young developers and engineers as mentors and speaks at various forums as a technical evangelist on many topics ranging from web and middleware all the way to various persistence stores.

See other products by John

Mishra

Charit Mishra is an ICS/SCADA security professional. He works as a security architect for critical infrastructure industry (oil and gas, energy and utility, transport, telecom, and so on) and holds extensive experience in security standards, framework, and technologies, with real hands-on experience in security. He has obtained leading industry certifications, such as OSCP, CEH, CompTIA Security+, and CCNA R&S. Also, he holds a master's degree in computer science. He regularly delivers professional trainings on critical infrastructure security internationally.

See other products by Mishra

Pankaj Misra

Pankaj Misra has been a technology evangelist, holding a bachelor's degree in engineering, with over 16 years of experience across multiple business domains and technologies. He has been working with Emirates Group IT since 2015, and has worked with various other organizations in the past. He specializes in architecting and building multi-stack solutions and implementations. He has also been a speaker at technology forums in India and has built products with scale-out architecture that support high-volume, near-real-time data processing and near-real-time analytics.

See other products by Pankaj Misra