Estimate pi
We can use map
or reduce
to estimate pi if we have code like this:
import pyspark
import random
if not 'sc' in globals():
sc = pyspark.SparkContext()
NUM_SAMPLES = 10000
random.seed(113)
def sample(p):
x, y = random.random(), random.random()
return 1 if x*x + y*y < 1 else 0
count = sc.parallelize(range(0, NUM_SAMPLES)) \
.map(sample) \
.reduce(lambda a, b: a + b)
print("Pi is roughly %f" % (4.0 * count / NUM_SAMPLES))
This code has the same preamble. We are using the Python random
package. There is a constant for the number of samples to attempt.
We are building an RDD called count
. We call the parallelize
function to split this process between the nodes available. The code just maps the result of the sample
function call. Finally, we reduce the generated map set by adding all the samples.
The sample
function gets two random numbers and returns a one or a zero depending on where the two numbers end up in size. We are looking for random...