Architecting the solution
To architect the solution, let’s summarize the analysis we discussed in the previous section. Here are the conclusions we can make:
- This is a real-time data engineering problem
- This problem can be solved using a streaming platform such as Kafka or Kinesis
- 1 million events will be published daily, with a chance of the volume of events increasing over time
- The solution should be hosted on a hybrid platform, where data processing and analysis are done on-premise and the results are stored in the cloud for easy retrieval
Since our streaming platform is on-premise and can be maintained on on-premise servers, Apache Kafka is a great choice. It supports a distributed, fault-tolerant, robust, and reliable architecture. It can be easily scaled by increasing the number of partitions and provides an at-least-once delivery guarantee (which ensures that at least one copy of all events will be delivered without event drops).
Now, let...