Partitioning and how it works
As the volume of incoming data increases, it can overwhelm the processing capabilities of the application resulting in increasing latencies and reduced throughput. It is rarely the case that the resources of the entire application are inadequate; instead, a careful analysis often reveals one or more bottlenecks. Addressing these bottlenecks will often resolve the problem. If the input data rate continues to increase, it may again cross the processability threshold, at which point the analysis must be repeated to find and resolve the new bottlenecks.
The modus operandi for addressing a bottleneck can take several forms, depending on the nature of the application, its configuration, the cluster environment, and other factors, for example:
- Use a faster algorithm if available and compute resources are the constraint
- Use more space-efficient algorithms and increase the memory allocation if excessive garbage collection (GC) calls are observed
- Use additional cluster nodes...