Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Practical Real-time Data Processing and Analytics Distributed Computing and Event Processing using Apache Spark, Flink, Storm, and Kafka

Product type Paperback

Published in Sep 2017

Publisher Packt

ISBN-13 9781787281202

Length 360 pages

Edition 1st Edition

Languages

Processing

Tools

Apache Spark

Concepts

Data Analysis

Authors (2):

Shilpi Saxena

Saurabh Gupta

View More author details

Table of Contents (14) Chapters

Preface

1. Introducing Real-Time Analytics FREE CHAPTER

2. Real Time Applications – The Basic Ingredients

3. Understanding and Tailing Data Streams

4. Setting up the Infrastructure for Storm

5. Configuring Apache Spark and Flink

6. Integrating Storm with a Data Source

7. From Storm to Sink

8. Storm Trident

9. Working with Spark

10. Working with Spark Operations

11. Spark Streaming

12. Working with Apache Flink

13. Case Study

Spark 2.x – advent of data frames and datasets

With Spark 2.x we have two new spark computational abstractions:

Data frames: These are distributed, resilient, fault tolerant in-memory data structures that are capable of handling only structured data, which means they are designed to manage data that can be segregated in fixed typed columns. Though it may sound like a limitation with respect to RDD, which can handle any type of unstructured data, in practical terms this structured abstraction over the data makes it very easy to manipulate and work over a large volume of structured data, the way we used to with RDBMS.
Datasets: It's an extension of the Spark data frame. It's a type safe object-oriented interface. For the sake of simplicity, one could say that data frames are actually an un-typed dataset. This newest API in spark pragmatic abstraction actually leverages...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (2)

Shilpi Saxena

Shilpi Saxena is an IT professional and also a technology evangelist. She is an engineer who has had exposure to various domains (machine to machine space, healthcare, telecom, hiring, and manufacturing). She has experience in all the aspects of conception and execution of enterprise solutions. She has been architecting, managing, and delivering solutions in the Big Data space for the last 3 years; she also handles a high-performance and geographically-distributed team of elite engineers. Shilpi has more than 12 years (3 years in the Big Data space) of experience in the development and execution of various facets of enterprise solutions both in the products and services dimensions of the software industry. An engineer by degree and profession, she has worn varied hats, such as developer, technical leader, product owner, tech manager, and so on, and she has seen all the flavors that the industry has to offer. She has architected and worked through some of the pioneers' production implementations in Big Data on Storm and Impala with autoscaling in AWS. Shilpi has also authored Real-time Analytics with Storm and Cassandra (https://www.packtpub.com/big-data-and-business-intelligence/learning-real-time-analytics-storm-and-cassandra) with Packt Publishing.

See other products by Shilpi Saxena

Saurabh Gupta

Saurabh Gupta is an software engineer who has worked aspects of software requirements, designing, execution, and delivery. Saurabh has more than 3 years of experience working in Big Data domain. Saurabh is handling and designing real time as well as batch processing projects running in production including technologies like Impala, Storm, NiFi, Kafka and deployment on AWS using Docker. Saurabh also worked in product development and delivery. Saurabh has total 10 years (3+ years in Big Data) rich experience in IT industry. Saurabh has exposure in various IOT use-cases including Telecom, HealthCare, Smart city, Smart cars and so on.

See other products by Saurabh Gupta