Tuning parallelism in Storm – scaling a distributed computation
To explain parallelism of Storm, we will configure three parameters:
- The number of workers
- The number of executors
- The number of tasks
The following figure gives a diagrammatic explanation of an example where we have a topology with just one spout and one bolt. In this case, we will set different values for the numbers of workers, executors, and tasks at the spout and bolt levels, and see how parallelism works in each case:
// assume we have two workers in total for topology. topology.workers: 2 // just one executor of spout. builder.setSpout("spout-sentence", TwitterStreamSpout(),1) // two executors of bolt. builder.setBolt("bolt-split", SplitSentenceBolt(),2) // four tasks for bolts. .setNumTasks(4) .shuffleGrouping("spout-sentence");
For this configuration, we will have two workers, which will run in separate JVMs (worker 1 and worker 2).
For the spout, there is one executor, and the default...