Remember the RatingsHistogram code that we ran for your first Spark program? Well, let's take a closer look at that and figure out what's actually going on under the hood with it. Understanding concepts is all well and good, but nothing beats looking at some real examples. Let's go back to the RatingsHistogram example that we started off with in this book. We'll break it down and understand exactly what it's doing under the hood and how it's using our RDDs to actually get the results for the RatingsHistogram data.
Ratings histogram walk-through
Understanding the code
The first couple of lines are just boilerplate stuff. One thing you'll see in every Python Spark script is the import statement...