Partitioning toolkit
Partitioning is appropriate, as described in the previous section, when an operator is likely to become a bottleneck. An operator is a bottleneck if it is unable to process the input stream at the required speed, causing tuples to back up in upstream buffers. Often, this also means lowered throughput and increased latencies between the time an input tuple enters the input port of the operator and the corresponding computed output tuple(s) leave the output port(s). If such an increase in latency or reduction in throughput is transient—lasts no more than a few seconds—then partitioning may not be needed (and may even be detrimental since it causes interruption of processing while existing operators are brought down and new operators are started) since OS and platform buffering will allow the operator to catch up once the spike in input has passed.
However, if the input data rates are likely to remain high for an extended period, partitioning may be needed. In any case,...