Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Mastering Apache Storm

You're reading from   Mastering Apache Storm Real-time big data streaming using Kafka, Hbase and Redis

Arrow left icon
Product type Paperback
Published in Aug 2017
Publisher
ISBN-13 9781787125636
Length 284 pages
Edition 1st Edition
Languages
Concepts
Arrow right icon
Author (1):
Arrow left icon
Ankit Jain Ankit Jain
Author Profile Icon Ankit Jain
Ankit Jain
Arrow right icon
View More author details
Toc

Table of Contents (13) Chapters Close

Preface 1. Real-Time Processing and Storm Introduction FREE CHAPTER 2. Storm Deployment, Topology Development, and Topology Options 3. Storm Parallelism and Data Partitioning 4. Trident Introduction 5. Trident Topology and Uses 6. Storm Scheduler 7. Monitoring of Storm Cluster 8. Integration of Storm and Kafka 9. Storm and Hadoop Integration 10. Storm Integration with Redis, Elasticsearch, and HBase 11. Apache Log Processing with Storm 12. Twitter Tweet Collection and Machine Learning

Apache Storm

Apache Storm has emerged as the platform of choice for industry leaders to develop distributed, real-time, data processing platforms. It provides a set of primitives that can be used to develop applications that can process a very large amount of data in real time in a highly scalable manner.

Storm is to real-time processing what Hadoop is to batch processing. It is open source software, and managed by Apache Software Foundation. It has been deployed to meet real-time processing needs by companies such as Twitter, Yahoo!, and Flipboard. Storm was first developed by Nathan Marz at BackType, a company that provided social search applications. Later, BackType was acquired by Twitter, and it is a critical part of their infrastructure. Storm can be used for the following use cases:

  • Stream processing: Storm is used to process a stream of data and update a variety of databases in real time. This processing occurs in real time and the processing speed needs to match the input data speed.
  • Continuous computation: Storm can do continuous computation on data streams and stream the results to clients in real time. This might require processing each message as it comes in or creating small batches over a short time. An example of continuous computation is streaming trending topics on Twitter into browsers.
  • Distributed RPC: Storm can parallelize an intense query so that you can compute it in real time.
  • Real-time analytics: Storm can analyze and respond to data that comes from different data sources as they happen in real time.

In this chapter, we will cover the following topics:

  • What is a Storm?
  • Features of Storm
  • Architecture and components of a Storm cluster
  • Terminologies of Storm
  • Programming language
  • Operation modes
You have been reading a chapter from
Mastering Apache Storm
Published in: Aug 2017
Publisher:
ISBN-13: 9781787125636
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime