Caching and uncaching of graphs
If you remember in the earlier chapters, we discussed the caching of RDDs, which basically referred to the fact that if you intend to use a particular RDD multiple times, you will need to cache the RDD, otherwise the Spark framework will recompute the RDD from scratch every time it is called.
Graphs (like RDDs) are not persisted in memory by default, and caching is the ideal option when using the graph multiple times. The implementation is quite similar where you simply call the cache on the Graph object:
myGraph.cache()
Uncaching unused objects from memory may improve performance. Cached RDDs and graphs remain memory resident until they are evicted due to memory pressures in an LRU order. For iterative computations, intermediate results from previous iterations will fill up the cache, resulting in slow garbage collection due to memory being filled up with unnecessary information.
Suggested approach: Uncache intermediate results as soon as possible, which might...