This section covers a small case study that is used to detect an IP default with Kafka and Spark Streaming, and the IP has attempted to hit the server many times. We will cover the following use cases:
- Producer: The Kafka producer API will be used to read a log file and publish documents on the topic of Kafka. In a real case, however, we could use the flume or producer application, which records in real time directly and publishes on Kafka.
- Fraud IPs list: We will keep a list of predefined IP frauds to identify the IPs for fraud. We use an in-memory IP list for this application, which can be substituted by fast key-based searching, such as HBase.
- Spark Streaming: Spark Streaming applications can read Kafka records and detect suspicious IPs and domains.
Maven is a tool for building and managing projects and we will build this project...