The first question that must be in your mind is why we need external data lookups in the stream processing pipeline. The answer is that sometimes you need to perform operations such as enrichment, data validation, or data filtering on incoming events based on some frequently changing external system data. However, in the streaming design context, these data lookups pose certain challenges. These data lookups may result in increased end-to-end latency as there will be frequent calls to external systems. You cannot hold all the external reference data in-memory as these external datasets are too big to fit in-memory. They also change too frequently, which makes refreshing memory difficult. If these external systems are down, then they will become a bottleneck for streaming solutions.
Keeping these challenges in mind, there are three important factors while...