Computing prime numbers using parallel operations
A good method for determining whether a number is prime or not is Eratosthenes's sieve. For each number, we check whether it fits the bill for a prime (if it meets the criteria for a prime, it will filter through the sieve).
The series of tests are run on every number we check for prime. This is a great usage for parallel operations. Spark has the in-built ability to split up a task among the threads/machines available. The threads are configured through the SparkContext
(we see that in every example).
In our case, we split up the workload among the available threads, each taking a set of numbers to check, and collect the results later on.
How to do it...
We can use a script like this:
import pyspark if not 'sc' in globals(): sc = pyspark.SparkContext() #check if a number is prime def isprime(n): # must be positive n = abs(int(n)) # 2 or more if n < 2: return False # 2 is the only even prime number if...