What this book covers
Chapter 1, Hadoop and Big Data, goes over how Hadoop has played a pivotal role in making several Internet businesses successful with big data from its beginnings in the previous decade. This chapter covers a brief history and the story of the evolution of Hadoop. It covers the Hadoop architecture and the MapReduce data processing framework. It introduces basic Hadoop programming in Java and provides a detailed overview of the business cases covered in the following chapters of this book. This chapter builds the foundation for understanding the rest of the book.
Chapter 2, A 360-Degree View of the Customer, covers building a 360-degree view of the customer. A good 360-degree view requires the integration of data from various sources. The data sources are database management systems storing master data and transactional data. Other data sources might include data captured from social media feeds. In this chapter, we will be integrating data from CRM systems, web logs, and Twitter feeds to build the 360-degree view and present it using a simple web interface. We will learn about Apache Sqoop and Apache Hive in the process of building our solution.
Chapter 3, Building a Fraud Detection System, covers the building of a real-time fraud detection system. This system predicts whether a financial transaction could be fraudulent by applying a clustering algorithm on a stream of transactions. We will learn about the architecture of the system and the coding steps involved in building the system. We will learn about Apache Spark in the process of building our solution.
Chapter 4, Marketing Campaign Planning, shows how to build a system that can improve the effectiveness of marketing campaigns. This system is a batch analytics system that uses historical campaign-response data to predict who is going to respond to a marketing folder. We will see how we can build a predictive model and use it to predict who is going to respond to which folder in our marketing campaign. We will learn about BigML in the process of building our solution.
Chapter 5, Churn Detection, explains how to use Hadoop to predict which customers are likely to move over to another company. We will cover the business case of a mobile telecom provider who would like to detect the customers who are likely to churn. These customers are given special incentives so that they can stay with the same provider. We will apply Bayes' Theorem to calculate the likelihood of churn. The model for churn detection will be built using Hadoop. We will learn about writing MapReduce programs in Java in the process of building our solution.
Chapter 6, Analyze Sensor Data Using Hadoop, is about how to build a system to analyze sensor data. Nowadays, sensors are considered an important source of big data. We will learn how Hadoop and big-data technologies can be helpful in the Internet of Things (IoT) domain. IoT is a network of connected devices that generate data through sensors. We will build a system to monitor the quality of the environment, such as humidity and temperature, in a factory. We will introduce Apache Kafka, Grafana, and OpenTSDB tools in the process of building the solution.
Chapter 7, Building a Data Lake, takes you through building a data lake using Hadoop and several other tools to import data in a data lake and provide secure access to the data. Data lakes are a popular business case for Hadoop. In a data lake, we store data from multiple sources to build a single source of data for the enterprise and build a security layer around it. We will learn about Apache Ranger, Apache Flume, and Apache Zeppelin in the process of building our solution.
Chapter 8, Future Directions, covers four separate topics that are relevant to Hadoop-based projects. These topics are building a Hadoop solutions team, Hadoop on the cloud, NoSQL databases, and in-memory databases. This chapter does not include any coding examples, unlike the other chapters. These fours topics have been covered in the essay form so that you can explore them further.