Speeding up merge with caching and optimization settings
In this recipe, we will start with a simple stream involving two Merge nodes. Although the sample size in the example is not extremely large, we will explore how you could speed up this stream if you were experiencing performance issues. In effect, we are performing a trade, trading available hard drive space to make it easier on the processor. One should be able to process millions of rows even if you are restricted to a client copy of Modeler. Note that, if you are experiencing these kinds of problems during Deployment, you should probably pursue a more complete solution. If, however, it is a data prep challenge, this should be helpful in getting you past the problem, and then during modeling you should consider a random sample.
Getting ready
We will start with the stream SpeedUpMerge.str
.
How to do it...
To speed up a Merge node by using a cache and optimization settings:
- Open the stream
SpeedUpMerge.str
. To run the entire stream on...