Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Data Lake for Enterprises Lambda Architecture for building enterprise data systems

Product type Paperback

Published in May 2017

Publisher Packt

ISBN-13 9781787281349

Length 596 pages

Edition 1st Edition

Languages

Java

Tools

Hadoop

Concepts

Data Processing

Authors (3):

Pankaj Misra

Tomcy John

Vivek Mishra

View More author details

Table of Contents (13) Chapters

Preface

1. Introduction to Data FREE CHAPTER

2. Comprehensive Concepts of a Data Lake

3. Lambda Architecture as a Pattern for Data Lake

4. Applied Lambda for Data Lake

5. Data Acquisition of Batch Data using Apache Sqoop

6. Data Acquisition of Stream Data using Apache Flume

7. Messaging Layer using Apache Kafka

8. Data Processing using Apache Flink

9. Data Store Using Apache Hadoop

10. Indexed Data Store using Elasticsearch

11. Data Lake Components Working Together

12. Data Lake Use Case Suggestions

Hadoop for near real-time applications

Hadoop has been popular for its capability for fast and performant batch processing of large amounts of varied data with considerable variance and high velocity. However, there was always an inherent need for handling data for near real-time applications as well.

While Flume did provide some level of stream based processing in the Hadoop ecosystem, it required considerable amount of implementation for custom processing. Most of the source and sink implementations of flume are performing data ETL roles. For any flume processing requirement, it required implementation of custom sinks.

A more mature implementation for near real-time processing of data came with Spark Streaming, which works with HDFS, based on micro-batches as discussed earlier, and provided greater capabilities compared to flume, as pipeline-based processing in near real time.

However, even if the data was...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (3)

John

Tomcy John lives in Dubai (United Arab Emirates), hailing from Kerala (India), and is an enterprise Java specialist with a degree in Engineering (B Tech) and over 14 years of experience in several industries. He's currently working as principal architect at Emirates Group IT, in their core architecture team. Prior to this, he worked with Oracle Corporation and Ernst & Young. His main specialization is in building enterprise-grade applications and he acts as chief mentor and evangelist to facilitate incorporating new technologies as corporate standards in the organization. Outside of his work, Tomcy works very closely with young developers and engineers as mentors and speaks at various forums as a technical evangelist on many topics ranging from web and middleware all the way to various persistence stores.

See other products by John

Mishra

Charit Mishra is an ICS/SCADA security professional. He works as a security architect for critical infrastructure industry (oil and gas, energy and utility, transport, telecom, and so on) and holds extensive experience in security standards, framework, and technologies, with real hands-on experience in security. He has obtained leading industry certifications, such as OSCP, CEH, CompTIA Security+, and CCNA R&S. Also, he holds a master's degree in computer science. He regularly delivers professional trainings on critical infrastructure security internationally.

See other products by Mishra

Pankaj Misra

Pankaj Misra has been a technology evangelist, holding a bachelor's degree in engineering, with over 16 years of experience across multiple business domains and technologies. He has been working with Emirates Group IT since 2015, and has worked with various other organizations in the past. He specializes in architecting and building multi-stack solutions and implementations. He has also been a speaker at technology forums in India and has built products with scale-out architecture that support high-volume, near-real-time data processing and near-real-time analytics.

See other products by Pankaj Misra