What do you get with eBook?

Instant access to your Digital eBook purchase

Download this book in EPUB and PDF formats

Access this title in our online reader with advanced features

DRM FREE - Read whenever, wherever and however you want

With the exponential growth in the amount of data being generated and advanced data-capturing capabilities, enterprises are facing the challenge of making sense out of this mountain of raw data. On the batch processing front, Hadoop has emerged as the go-to framework to deal with big data. Until recently, there has been a void when one looks for frameworks to build real-time stream processing applications. Such applications have become an integral part of a lot of businesses as they enable them to respond swiftly to events and adapt to changing situations. Examples of this are monitoring social media to analyze public response to any new product that you launch and predicting the outcome of an election based on the sentiments of election-related posts.

Organizations are collecting a large volume of data from external sources and want to evaluate/process the data in real time to get market trends, detect fraud, identify user behavior, and so on. The need for real-time processing is increasing day by day and we require a real-time system/platform that should support the following features:

Scalable: The platform should be horizontally scalable without any down time.
Fault tolerance: The platform should be able to process the data even after some of the nodes in a cluster go down.
No data lost: The platform should provide the guaranteed processing of messages.
High throughput: The system should be able to support millions of records per second and also support any size of messages.
Easy to operate: The system should have easy installation and operation. Also, the expansion of clusters should be an easy process.
Multiple languages: The platform should support multiple languages. The end user should be able to write code in different languages. For example, a user can write code in Python, Scala, Java, and so on. Also, we can execute different language code inside the one cluster.
Cluster isolation: The system should support isolation so that dedicated processes can be assigned to dedicated machines for processing.

Key benefits

Exploit the various real-time processing functionalities offered by Apache Storm such as parallelism, data partitioning, and more

Integrate Storm with other Big Data technologies like Hadoop, HBase, and Apache Kafka

An easy-to-understand guide to effortlessly create distributed applications with Storm

Description

Apache Storm is a real-time Big Data processing framework that processes large amounts of data reliably, guaranteeing that every message will be processed. Storm allows you to scale your data as it grows, making it an excellent platform to solve your big data problems. This extensive guide will help you understand right from the basics to the advanced topics of Storm. The book begins with a detailed introduction to real-time processing and where Storm fits in to solve these problems. You’ll get an understanding of deploying Storm on clusters by writing a basic Storm Hello World example. Next we’ll introduce you to Trident and you’ll get a clear understanding of how you can develop and deploy a trident topology. We cover topics such as monitoring, Storm Parallelism, scheduler and log processing, in a very easy to understand manner. You will also learn how to integrate Storm with other well-known Big Data technologies such as HBase, Redis, Kafka, and Hadoop to realize the full potential of Storm. With real-world examples and clear explanations, this book will ensure you will have a thorough mastery of Apache Storm. You will be able to use this knowledge to develop efficient, distributed real-time applications to cater to your business needs.

Who is this book for?

If you are a Java developer who wants to enter into the world of real-time stream processing applications using Apache Storm, then this book is for you. No previous experience in Storm is required as this book starts from the basics. After finishing this book, you will be able to develop not-so-complex Storm applications.

What you will learn

Understand the core concepts of Apache Storm and real-time processing

Follow the steps to deploy multiple nodes of Storm Cluster

Create Trident topologies to support various message-processing semantics

Make your cluster sharing effective using Storm scheduling

Integrate Apache Storm with other Big Data technologies such as Hadoop, HBase, Kafka, and more

Monitor the health of your Storm cluster

What do you get with eBook?