You're reading from Real-Time Big Data Analytics Design, process, and analyze large sets of complex data in real time

Product type Paperback

Published in Feb 2016

Publisher

ISBN-13 9781784391409

Length 326 pages

Edition 1st Edition

Languages

Java

Tools

Apache Spark

Concepts

Big Data

Author (1):

Shilpi Saxena

View More author details

Table of Contents (12) Chapters

Preface

1. Introducing the Big Data Technology Landscape and Analytics Platform FREE CHAPTER

2. Getting Acquainted with Storm

3. Processing Data with Storm

4. Introduction to Trident and Optimizing Storm Performance

5. Getting Acquainted with Kinesis

6. Getting Acquainted with Spark

7. Programming with RDDs

8. SQL Query Engine for Spark – Spark SQL

9. Analysis of Streaming Data Using Spark Streaming

10. Introducing Lambda Architecture

Index

Preface

Processing historical data for the past 10-20 years, performing analytics, and finally producing business insights is the most popular use case for today's modern enterprises.

Enterprises have been focusing on developing data warehouses (https://en.wikipedia.org/wiki/Data_warehouse) where they want to store the data fetched from every possible data source and leverage various BI tools to provide analytics over the data stored in these data warehouses. But developing data warehouses is a complex, time consuming, and costly process, which requires a considerable investment, both in terms of money and time.

No doubt that the emergence of Hadoop and its ecosystem have provided a new paradigm or architecture to solve large data problems where it provides a low cost and scalable solution which processes terabytes of data in a few hours which earlier could have taken days. But this is only one side of the coin. Hadoop was meant for batch processes while there are bunch of other business use cases that are required to perform analytics and produce business insights in real or near real-time (subseconds SLA). This was called real-time analytics (RTA) or near real-time analytics (NRTA) and sometimes it was also termed as "fast data" where it implied the ability to make near real-time decisions and enable "orders-of-magnitude" improvements in elapsed time to decisions for businesses.

A number of powerful, easy to use open source platforms have emerged to solve these enterprise real-time analytics data use cases. Two of the most notable ones are Apache Storm and Apache Spark, which offer real-time data processing and analytics capabilities to a much wider range of potential users. Both projects are a part of the Apache Software Foundation and while the two tools provide overlapping capabilities, they still have distinctive features and different roles to play.

Interesting isn't it?

Let's move forward and jump into the nitty gritty of real-time Big Data analytics with Apache Storm and Apache Spark. This book provides you with the skills required to quickly design, implement, and deploy your real-time analytics using real-world examples of Big Data use cases.

The rest of the chapter is locked

You're reading from Real-Time Big Data Analytics Design, process, and analyze large sets of complex data in real time

Table of Contents (12) Chapters

Preface

Authors (1)

Personalised recommendations for you

You're reading from Real-Time Big Data Analytics Design, process, and analyze large sets of complex data in real time

Table of Contents (12) Chapters

Preface

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you