Summary
In this chapter, we learnt about real-time analytics and saw how big data can be used in real-time analytics apart from batch processing too. We introduced the product Impala that can be used to fire fast SQL queries on big data which is usually stored in Parquet format in HDFS. While looking at Impala we briefly did a simple case study on flight analytics using Impala. We later covered Apache Kafka a messaging product that can be used in conjunction with big data technologies and build real time data stacks. Kafka is a scalable messaging solution and we showed how it can be integrated with Spark Streaming module of Apache Spark. Spark Streaming let's you collect data in mini batches in real time and it calls sequence of these mini batches as streams. Spark Streaming is becoming very popular these days as it is a good scalable solution that fits into the needs of many users. We finally covered a few cases studies using Apache Kafka and Spark Streaming and showed how complex...