Preface
Processing historical data for the past 10-20 years, performing analytics, and finally producing business insights is the most popular use case for today's modern enterprises.
Enterprises have been focusing on developing data warehouses (https://en.wikipedia.org/wiki/Data_warehouse) where they want to store the data fetched from every possible data source and leverage various BI tools to provide analytics over the data stored in these data warehouses. But developing data warehouses is a complex, time consuming, and costly process, which requires a considerable investment, both in terms of money and time.
No doubt that the emergence of Hadoop and its ecosystem have provided a new paradigm or architecture to solve large data problems where it provides a low cost and scalable solution which processes terabytes of data in a few hours which earlier could have taken days. But this is only one side of the coin. Hadoop was meant for batch processes while there are bunch of other business use cases that are required to perform analytics and produce business insights in real or near real-time (subseconds SLA). This was called real-time analytics (RTA) or near real-time analytics (NRTA) and sometimes it was also termed as "fast data" where it implied the ability to make near real-time decisions and enable "orders-of-magnitude" improvements in elapsed time to decisions for businesses.
A number of powerful, easy to use open source platforms have emerged to solve these enterprise real-time analytics data use cases. Two of the most notable ones are Apache Storm and Apache Spark, which offer real-time data processing and analytics capabilities to a much wider range of potential users. Both projects are a part of the Apache Software Foundation and while the two tools provide overlapping capabilities, they still have distinctive features and different roles to play.
Interesting isn't it?
Let's move forward and jump into the nitty gritty of real-time Big Data analytics with Apache Storm and Apache Spark. This book provides you with the skills required to quickly design, implement, and deploy your real-time analytics using real-world examples of Big Data use cases.