Defining new Cascalog operators
Cascalog comes with a number of operators; however, you'll often need to define your own, as we saw in the Aggregating data in Cascalog recipe.
For different uses, Cascalog defines a number of different categories of operators, each with different properties. Some are run in the map phase of processing, and some are run in the reduce phase. The ones in the map phase can use a number of extra optimizations, so if you can push some of your processing into that stage, you'll get better performance. In this recipe, you'll see which categories of operators are on the map side and which are on the reduce side. We'll also provide an example of each and see how they fit into the larger processing model.
Getting ready
For this recipe, we'll use the same dependencies and inclusions that we did in the Initializing Cascalog and Hadoop for distributed processing recipe. We'll also use the Doctor Who companion data from that recipe.
How to do it…
As I mentioned, Cascalog allows...