You're reading from Real-Time Big Data Analytics Design, process, and analyze large sets of complex data in real time

Product type Paperback

Published in Feb 2016

Publisher

ISBN-13 9781784391409

Length 326 pages

Edition 1st Edition

Languages

Java

Tools

Apache Spark

Concepts

Big Data

Author (1):

Shilpi Saxena

View More author details

Table of Contents (12) Chapters

Preface

1. Introducing the Big Data Technology Landscape and Analytics Platform FREE CHAPTER

2. Getting Acquainted with Storm

3. Processing Data with Storm

4. Introduction to Trident and Optimizing Storm Performance

5. Getting Acquainted with Kinesis

6. Getting Acquainted with Spark

7. Programming with RDDs

8. SQL Query Engine for Spark – Spark SQL

9. Analysis of Streaming Data Using Spark Streaming

10. Introducing Lambda Architecture

Index

The Big Data ecosystem

For a beginner, the landscape can be utterly confusing. There is vast arena of technologies and equally varied use cases. There is no single go-to solution; every use case has a custom solution and this widespread technology stack and lack of standardization is making Big Data a difficult path to tread for developers. There are a multitude of technologies that exist which can draw meaningful insight out of this magnitude of data.

Let's begin with the basics: the environment for any data analytics application creation should provide for the following:

Storing data
Enriching or processing data
Data analysis and visualization

If we get to specialization, there are specific Big Data tools and technologies available; for instance, ETL tools such as Talend and Pentaho; Pig batch processing, Hive, and MapReduce; real-time processing from Storm, Spark, and so on; and the list goes on. Here's the pictorial representation of the vast Big Data technology landscape, as per Forbes:

Source: http://www.forbes.com/sites/davefeinleib/2012/06/19/the-big-data-landscape/

It clearly depicts the various segments and verticals within the Big Data technology canvas:

Platforms such as Hadoop and NoSQL
Analytics such as HDP, CDH, EMC, Greenplum, DataStax, and more
Infrastructure such as Teradata, VoltDB, MarkLogic, and more
Infrastructure as a Service (IaaS) such as AWS, Azure, and more
Structured databases such as Oracle, SQL server, DB2, and more
Data as a Service (DaaS) such as INRIX, LexisNexis, Factual, and more

And, beyond that, we have a score of segments related to specific problem area such as Business Intelligence (BI), analytics and visualization, advertisement and media, log data and vertical apps, and so on.

You're reading from Real-Time Big Data Analytics Design, process, and analyze large sets of complex data in real time

Table of Contents (12) Chapters

The Big Data ecosystem

Authors (1)

Personalised recommendations for you