Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Mastering Concurrency in Python

You're reading from   Mastering Concurrency in Python Create faster programs using concurrency, asynchronous, multithreading, and parallel programming

Arrow left icon
Product type Paperback
Published in Nov 2018
Publisher Packt
ISBN-13 9781789343052
Length 446 pages
Edition 1st Edition
Languages
Concepts
Arrow right icon
Author (1):
Arrow left icon
Quan Nguyen Quan Nguyen
Author Profile Icon Quan Nguyen
Quan Nguyen
Arrow right icon
View More author details
Toc

Table of Contents (22) Chapters Close

Preface 1. Advanced Introduction to Concurrent and Parallel Programming FREE CHAPTER 2. Amdahl's Law 3. Working with Threads in Python 4. Using the with Statement in Threads 5. Concurrent Web Requests 6. Working with Processes in Python 7. Reduction Operators in Processes 8. Concurrent Image Processing 9. Introduction to Asynchronous Programming 10. Implementing Asynchronous Programming in Python 11. Building Communication Channels with asyncio 12. Deadlocks 13. Starvation 14. Race Conditions 15. The Global Interpreter Lock 16. Designing Lock-Based and Mutex-Free Concurrent Data Structures 17. Memory Models and Operations on Atomic Types 18. Building a Server from Scratch 19. Testing, Debugging, and Scheduling Concurrent Applications 20. Assessments 21. Other Books You May Enjoy

What is concurrency?

It is estimated that the amount of data that needs to be processed by computer programs doubles every two years. The International Data Corporation (IDC), for example, estimates that, by 2020, there will be 5,200 GB of data for every person on earth. With this staggering volume of data come insatiable demands for computing power, and, while numerous computing techniques are being developed and utilized every day, concurrent programming remains one of the most prominent ways to effectively and accurately process data.

While some might be intimidated when the word concurrency appears, the notion behind it is quite intuitive, and it is very common, even in a non-programming context. However, this is not to say that concurrent programs are as simple as sequential ones; they are indeed more difficult to write and understand. Yet, once a correct and effective concurrent structure is achieved, significant improvement in execution time will follow, as you will see later on.

Concurrent versus sequential

Perhaps the most obvious way to understand concurrent programming is to compare it to sequential programming. While a sequential program is in one place at a time, in a concurrent program, different components are in independent, or semi-independent, states. This means that components in different states can be executed independently, and therefore at the same time (as the execution of one component does not depend on the result of another). The following diagram illustrates the basic differences between these two types:

Difference between concurrent and sequential programs

One immediate advantage of concurrency is an improvement in execution time. Again, since some tasks are independent and can therefore be completed at the same time, less time is required for the computer to execute the whole program.

Example 1 – checking whether a non-negative number is prime

Let's consider a quick example. Suppose that we have a simple function that checks whether a non-negative number is prime, as follows:

# Chapter01/example1.py

from math import sqrt

def is_prime(x):
if x < 2:
return False

if x == 2:
return True

if x % 2 == 0:
return False

limit = int(sqrt(x)) + 1
for i in range(3, limit, 2):
if x % i == 0:
return False

return True

Also, suppose that we have a list of significantly large integers (1013 to 1013 + 500), and we want to check whether each of them is prime by using the preceding function:

input = [i for i in range(10 ** 13, 10 ** 13 + 500)]

A sequential approach would be to simply pass one number after another to the is_prime() function, as follows:

# Chapter01/example1.py

from timeit import default_timer as timer

# sequential
start = timer()
result = []
for i in input:
if is_prime(i):
result.append(i)
print('Result 1:', result)
print('Took: %.2f seconds.' % (timer() - start))

Copy the code or download it from the GitHub repository and run it (using the python example1.py command). The first section of your output will be something similar to the following:

> python example1.py
Result 1: [10000000000037, 10000000000051, 10000000000099, 10000000000129, 10000000000183, 10000000000259, 10000000000267, 10000000000273, 10000000000279, 10000000000283, 10000000000313, 10000000000343, 10000000000391, 10000000000411, 10000000000433, 10000000000453]
Took: 3.41 seconds.

You can see that the program took around 3.41 seconds to process all of the numbers; we will come back to this number soon. For now, it will also be beneficial for us to check how hard the computer was working while running the program. Open an Activity Monitor application in your operating system, and run the Python script again; the following screenshot shows my results:

Activity Monitor showing computer performance

Evidently, the computer was not working too hard, as it was nearly 83% idle.

Now, let's see if concurrency can actually help us to improve our program. The is_prime() function contains a lot of heavy computation, and therefore it is a good candidate for concurrent programming. Since the process of passing one number to the is_prime() function is independent from passing another, we could potentially apply concurrency to our program, as follows:

# Chapter01/example1.py

# concurrent
start = timer()
result = []
with concurrent.futures.ProcessPoolExecutor(max_workers=20) as executor:
futures = [executor.submit(is_prime, i) for i in input]

for i, future in enumerate(concurrent.futures.as_completed(futures)):
if future.result():
result.append(input[i])

print('Result 2:', result)
print('Took: %.2f seconds.' % (timer() - start))

Roughly speaking, we are splitting the tasks into different, smaller chunks, and running them at the same time. Don't worry about the specifics of the code for now, as we will discuss this use of a pool of processes in greater detail later on.

When I executed the function, the execution time was noticeably better, and the computer also used more of its resources, being only 37% idle:

> python example1.py
Result 2: [10000000000183, 10000000000037, 10000000000129, 10000000000273, 10000000000259, 10000000000343, 10000000000051, 10000000000267, 10000000000279, 10000000000099, 10000000000283, 10000000000313, 10000000000391, 10000000000433, 10000000000411, 10000000000453]
Took: 2.33 seconds

The output of the Activity Monitor application will look something like the following:

Activity Monitor showing computer performance

Concurrent versus parallel

At this point, if you have had some experience in parallel programming, you might be wondering whether concurrency is any different from parallelism. The key difference between concurrent and parallel programming is that, while in parallel programs there are a number of processing flows (mainly CPUs and cores) working independently all at once, there might be different processing flows (mostly threads) accessing and using a shared resource at the same time in concurrent programs.

Since this shared resource can be read and overwritten by any of the different processing flows, some form of coordination is required at times, when the tasks that need to be executed are not entirely independent from one another. In other words, it is important for some tasks to be executed after the others, to ensure that the programs will produce the correct results.

Difference between concurrency and parallelism

The preceding figure illustrates the difference between concurrency and parallelism: while in the upper section, parallel activities (in this case, cars) that do not interact with each other can run at the same time, in the lower section, some tasks have to wait for others to finish before they can be executed.

We will look at more examples of these distinctions later on.

A quick metaphor

Concurrency is a quite difficult concept to fully grasp immediately, so let's consider a quick metaphor, in order to make concurrency and its differences from parallelism easier to understand.

Although some neuroscientists might disagree, let's briefly assume that different parts of the human brain are responsible for performing separate, exclusive body part actions and activities. For example, the left hemisphere of the brain controls the right side of the body, and hence, the right hand (and vice versa); or, one part of the brain might be responsible for writing, while another solely processes speaking.

Now, let's consider the first example, specifically. If you want to move your left hand, the right side of your brain (and only the right side) has to process that command to move, which means that the left side of your brain is free to process other information. So, it is possible to move and use the left and right hands at the same time, in order to do different things. Similarly, it is possible to be writing and talking at the same time.

That is parallelism: where different processes don't interact with, and are independent of, each other. Remember that concurrency is not quite like parallelism. Even though there are instances where processes are executed together, concurrency also involves sharing the same resources. If parallelism is similar to using your left and right hands for independent tasks at the same time, concurrency can be associated with juggling, where the two hands perform different tasks simultaneously, but they also interact with the same object (in this case, the juggling balls), and some form of coordination between the two hands is therefore required.

You have been reading a chapter from
Mastering Concurrency in Python
Published in: Nov 2018
Publisher: Packt
ISBN-13: 9781789343052
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image