Let's consider a quick example. Suppose that we have a simple function that checks whether a non-negative number is prime, as follows:
# Chapter01/example1.py
from math import sqrt
def is_prime(x):
if x < 2:
return False
if x == 2:
return True
if x % 2 == 0:
return False
limit = int(sqrt(x)) + 1
for i in range(3, limit, 2):
if x % i == 0:
return False
return True
Also, suppose that we have a list of significantly large integers (1013 to 1013 + 500), and we want to check whether each of them is prime by using the preceding function:
input = [i for i in range(10 ** 13, 10 ** 13 + 500)]
A sequential approach would be to simply pass one number after another to the is_prime() function, as follows:
# Chapter01/example1.py
from timeit import default_timer as timer
# sequential
start = timer()
result = []
for i in input:
if is_prime(i):
result.append(i)
print('Result 1:', result)
print('Took: %.2f seconds.' % (timer() - start))
Copy the code or download it from the GitHub repository and run it (using the python example1.py command). The first section of your output will be something similar to the following:
> python example1.py
Result 1: [10000000000037, 10000000000051, 10000000000099, 10000000000129, 10000000000183, 10000000000259, 10000000000267, 10000000000273, 10000000000279, 10000000000283, 10000000000313, 10000000000343, 10000000000391, 10000000000411, 10000000000433, 10000000000453]
Took: 3.41 seconds.
You can see that the program took around 3.41 seconds to process all of the numbers; we will come back to this number soon. For now, it will also be beneficial for us to check how hard the computer was working while running the program. Open an Activity Monitor application in your operating system, and run the Python script again; the following screenshot shows my results:
Activity Monitor showing computer performance
Evidently, the computer was not working too hard, as it was nearly 83% idle.
Now, let's see if concurrency can actually help us to improve our program. The is_prime() function contains a lot of heavy computation, and therefore it is a good candidate for concurrent programming. Since the process of passing one number to the is_prime() function is independent from passing another, we could potentially apply concurrency to our program, as follows:
# Chapter01/example1.py
# concurrent
start = timer()
result = []
with concurrent.futures.ProcessPoolExecutor(max_workers=20) as executor:
futures = [executor.submit(is_prime, i) for i in input]
for i, future in enumerate(concurrent.futures.as_completed(futures)):
if future.result():
result.append(input[i])
print('Result 2:', result)
print('Took: %.2f seconds.' % (timer() - start))
Roughly speaking, we are splitting the tasks into different, smaller chunks, and running them at the same time. Don't worry about the specifics of the code for now, as we will discuss this use of a pool of processes in greater detail later on.
When I executed the function, the execution time was noticeably better, and the computer also used more of its resources, being only 37% idle:
> python example1.py
Result 2: [10000000000183, 10000000000037, 10000000000129, 10000000000273, 10000000000259, 10000000000343, 10000000000051, 10000000000267, 10000000000279, 10000000000099, 10000000000283, 10000000000313, 10000000000391, 10000000000433, 10000000000411, 10000000000453]
Took: 2.33 seconds
The output of the Activity Monitor application will look something like the following:
Activity Monitor showing computer performance