Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Data Lake for Enterprises Lambda Architecture for building enterprise data systems

Product type Paperback

Published in May 2017

Publisher Packt

ISBN-13 9781787281349

Length 596 pages

Edition 1st Edition

Languages

Java

Tools

Hadoop

Concepts

Data Processing

Authors (3):

Pankaj Misra

Tomcy John

Vivek Mishra

View More author details

Table of Contents (13) Chapters

Preface

1. Introduction to Data FREE CHAPTER

2. Comprehensive Concepts of a Data Lake

3. Lambda Architecture as a Pattern for Data Lake

4. Applied Lambda for Data Lake

5. Data Acquisition of Batch Data using Apache Sqoop

6. Data Acquisition of Stream Data using Apache Flume

7. Messaging Layer using Apache Kafka

8. Data Processing using Apache Flink

9. Data Store Using Apache Hadoop

10. Indexed Data Store using Elasticsearch

11. Data Lake Components Working Together

12. Data Lake Use Case Suggestions

Indexed Data Store using Elasticsearch

In the previous chapter on Hadoop, we persisted the data in hand onto Hadoop (HDFS). Reading/querying data from Hadoop at a fast pace is an issue, and that's when an indexed data store such as Elasticsearch and its significance come forth in our Data Lake implementation.

As in other chapters in this part of the book, we will start off the chapter by explaining the layer where this technology will be used. We will then explain the reason for choosing this technology for this capability and start diving deep into Elasticsearch and its working. We will cover enough details on Elasticsearch so that you have adequate details to understand this technology. As always we will only give enough details and full deep dive is beyond the scope of this book.
We would then take you through a hands-on coding session, where you will first learn to install this technology and then...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (3)

John

Tomcy John lives in Dubai (United Arab Emirates), hailing from Kerala (India), and is an enterprise Java specialist with a degree in Engineering (B Tech) and over 14 years of experience in several industries. He's currently working as principal architect at Emirates Group IT, in their core architecture team. Prior to this, he worked with Oracle Corporation and Ernst & Young. His main specialization is in building enterprise-grade applications and he acts as chief mentor and evangelist to facilitate incorporating new technologies as corporate standards in the organization. Outside of his work, Tomcy works very closely with young developers and engineers as mentors and speaks at various forums as a technical evangelist on many topics ranging from web and middleware all the way to various persistence stores.

See other products by John

Mishra

Charit Mishra is an ICS/SCADA security professional. He works as a security architect for critical infrastructure industry (oil and gas, energy and utility, transport, telecom, and so on) and holds extensive experience in security standards, framework, and technologies, with real hands-on experience in security. He has obtained leading industry certifications, such as OSCP, CEH, CompTIA Security+, and CCNA R&S. Also, he holds a master's degree in computer science. He regularly delivers professional trainings on critical infrastructure security internationally.

See other products by Mishra

Pankaj Misra

Pankaj Misra has been a technology evangelist, holding a bachelor's degree in engineering, with over 16 years of experience across multiple business domains and technologies. He has been working with Emirates Group IT since 2015, and has worked with various other organizations in the past. He specializes in architecting and building multi-stack solutions and implementations. He has also been a speaker at technology forums in India and has built products with scale-out architecture that support high-volume, near-real-time data processing and near-real-time analytics.

See other products by Pankaj Misra