Passing functions to Spark (Python)
Python provides a simple way to pass functions to Spark. The Spark programming guide available at spark.apache.org
suggests there are three recommended ways to do this:
- Lambda expressions is the ideal way for short functions that can be written inside a single expression
- Local
defs
inside the function calling into Spark for longer code - Top-level functions in a module
While we have already looked at the lambda functions in some of the previous examples, let's look at local definitions of the functions. We can encapsulate our business logic which is splitting of words, and counting into two separate functions as shown below.
def splitter(lineOfText):
words = lineOfText.split(" ")
return len(words)
def aggregate(numWordsLine1, numWordsLineNext):
totalWords = numWordsLine1 + numWordsLineNext
return totalWords
Let's see the working code example:
Figure 2.15: Code example of Python word count (local definition...