Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Real-Time Big Data Analytics

You're reading from   Real-Time Big Data Analytics Design, process, and analyze large sets of complex data in real time

Arrow left icon
Product type Paperback
Published in Feb 2016
Publisher
ISBN-13 9781784391409
Length 326 pages
Edition 1st Edition
Languages
Concepts
Arrow right icon
Author (1):
Arrow left icon
Shilpi Saxena Shilpi Saxena
Author Profile Icon Shilpi Saxena
Shilpi Saxena
Arrow right icon
View More author details
Toc

Table of Contents (12) Chapters Close

Preface 1. Introducing the Big Data Technology Landscape and Analytics Platform FREE CHAPTER 2. Getting Acquainted with Storm 3. Processing Data with Storm 4. Introduction to Trident and Optimizing Storm Performance 5. Getting Acquainted with Kinesis 6. Getting Acquainted with Spark 7. Programming with RDDs 8. SQL Query Engine for Spark – Spark SQL 9. Analysis of Streaming Data Using Spark Streaming 10. Introducing Lambda Architecture Index

Real-time processing

Now that we have talked so extensively about Big Data processing and Big Data persistence in the context of distributed, batch-oriented systems, the next obvious thing to talk about is real-time or near real-time processing. Big data processing processes huge datasets in offline batch mode. When real-time stream processing is executed on the most current set of data, we operate in the dimension of now or the immediate past; examples are credit card fraud detection, security, and so on. Latency is a key aspect in these analytics.

The two operatives here are velocity and latency, and that's where Hadoop and related distributed batch processing systems fall short. They are designed to deliver in batch mode and can't operate at a latency of nanoseconds/milliseconds. In use cases where we need accurate results in fractions of seconds, for example, credit card fraud, monitoring business activity, and so on, we need a Complex Event Processing (CEP) engine to process and derive results at lightning fast speed.

Storm, initially a project from the house of Twitter, has graduated to the league of Apache and was rechristened from Twitter Storm. It was a brainchild of Nathan Marz that's now been adopted by CDH, HDP, and so on.

Apache Storm is a highly scalable, distributed, fast, reliable real-time computing system designed to process high-velocity data. Cassandra complements the compute capability by providing lightning fast reads and writes, and this is the best combination available as of now for a data store with Storm. It helps the developer to create a data flow model in which tuples flow continuously through a topology (a collection of processing components). Data can be ingested to Storm using distributed messaging queues such as Kafka, RabbitMQ, and so on. Trident is another layer of abstraction API over Storm that brings microbatching capabilities into it.

Let's take a closer look at a couple of real-time, real-world use cases in various industrial segments.

The telecoms or cellular arena

We are living in an era where cell phones are no longer merely calling devices. In fact, they have evolved from being phones to smartphones, providing access to not just calling but also facilities such as data, photographs, tracking, GPS, and so on into the hands of the consumers. Now, the data generated by cell phones or telephones is not just call data; the typical CDR (short for Call Data Record) captures voice, data, and SMS transactions. Voice and SMS transactions have existed for more than a decade and are predominantly structured as they are because of telecoms protocols worldwide; for example, CIBER, SMPP, SMSC, and so on. However, the data or IP traffic flowing in/out of these smart devices is pretty unstructured and high volume. It could be a music track, a picture, a tweet, or just about anything in the data dimension. CDR processing and billing is generally a batch job, but a lot of other things are real-time:

  • Geo-tracking of the device: Have you noticed how quickly we get an SMS whenever we cross a state border?
  • Usage and alerts: Have you noticed how accurate and efficient the alert that informs you about the broadband consumption limit is and suggests that you top up the same?
  • Prepaid mobile cards: If you have ever used a prepaid system, you must have been awed at the super-efficient charge-tracking system they have in place.

Transportation and logistics

Transportation and logistics is another useful segment that's using real-time analytics from vehicular data for transportation, logistics, and intelligent traffic management. Here's an example from McKinney's report that details how Big Data and real-time analytics are helping to handle traffic congestion on a major highway in Tel Aviv, the capital of Israel. Here's what they actually do: they monitor the receipts from the toll constantly and during the peak hours, to avert congestion, they hike the toll prices. This is a deterrent factor for the users. Once the congestion eases out during non-peak hours, the toll rates are reduced.

There may be many more use cases that can be built around the data from check-posts/tolls to develop intelligent management of traffic, thus preventing congestion, and make better utilization of public infrastructure.

The connected vehicle

An idea that was still in the realms of fiction until the last decade is now a reality that's being actively used by the consumer segment today. GPS and Google Maps are no news today, they are being imbibed and heavily used features.

My car's control unit has telemetry devices that capture various KPIs, such as engine temperature, fuel consumption pattern, RPM, and so on, and all this information is used by the manufacturers for analysis. In some of the cases, the user is also allowed to set and receive alerts on these KPI thresholds.

The financial sector

This is the sector that's emerging as the biggest consumer of real-time analytics for very obvious reasons. The volume of data is huge and quickly changing; the impact of analytics and its results boils down to the money aspect. This sector needs real-time instruments for rapid and precise data analysis for data from stock exchanges, various financial institutions, market prices and fluctuations, and so on.

You have been reading a chapter from
Real-Time Big Data Analytics
Published in: Feb 2016
Publisher:
ISBN-13: 9781784391409
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image