Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Mastering Concurrency in Python

You're reading from   Mastering Concurrency in Python Create faster programs using concurrency, asynchronous, multithreading, and parallel programming

Arrow left icon
Product type Paperback
Published in Nov 2018
Publisher Packt
ISBN-13 9781789343052
Length 446 pages
Edition 1st Edition
Languages
Concepts
Arrow right icon
Author (1):
Arrow left icon
Quan Nguyen Quan Nguyen
Author Profile Icon Quan Nguyen
Quan Nguyen
Arrow right icon
View More author details
Toc

Table of Contents (22) Chapters Close

Preface 1. Advanced Introduction to Concurrent and Parallel Programming FREE CHAPTER 2. Amdahl's Law 3. Working with Threads in Python 4. Using the with Statement in Threads 5. Concurrent Web Requests 6. Working with Processes in Python 7. Reduction Operators in Processes 8. Concurrent Image Processing 9. Introduction to Asynchronous Programming 10. Implementing Asynchronous Programming in Python 11. Building Communication Channels with asyncio 12. Deadlocks 13. Starvation 14. Race Conditions 15. The Global Interpreter Lock 16. Designing Lock-Based and Mutex-Free Concurrent Data Structures 17. Memory Models and Operations on Atomic Types 18. Building a Server from Scratch 19. Testing, Debugging, and Scheduling Concurrent Applications 20. Assessments 21. Other Books You May Enjoy

Not everything should be made concurrent

Not all programs are created equal: some can be made parallel or concurrent relatively easily, while others are inherently sequential, and thus cannot be executed concurrently, or in parallel. An extreme example of the former is embarrassingly parallel programs, which can be divided into different parallel tasks, between which there is little or no dependency or need for communication.

Embarrassingly parallel

A common example of an embarrassingly parallel program is the 3D video rendering handled by a graphics processing unit, where each frame or pixel can be processed with no interdependency. Password cracking is another embarrassingly parallel task that can easily be distributed on CPU cores. In a later chapter, we will tackle a number of similar problems, including image processing and web scraping, which can be made concurrent/parallel intuitively, resulting in significantly improved execution times.

Inherently sequential

In opposition to embarrassingly parallel tasks, the execution of some tasks depends heavily on the results of others. In other words, those tasks are not independent, and thus, cannot be made parallel or concurrent. Furthermore, if we were to try to implement concurrency into those programs, it could cost us more execution time to produce the same results. Let's go back to our prime-checking example from earlier; the following is the output that we saw:

> python example1.py
Result 1: [10000000000037, 10000000000051, 10000000000099, 10000000000129, 10000000000183, 10000000000259, 10000000000267, 10000000000273, 10000000000279, 10000000000283, 10000000000313, 10000000000343, 10000000000391, 10000000000411, 10000000000433, 10000000000453]
Took: 3.41 seconds.
Result 2: [10000000000183, 10000000000037, 10000000000129, 10000000000273, 10000000000259, 10000000000343, 10000000000051, 10000000000267, 10000000000279, 10000000000099, 10000000000283, 10000000000313, 10000000000391, 10000000000433, 10000000000411, 10000000000453]
Took: 2.33 seconds.

Pay close attention, and you will see that the two results from the two methods are not identical; the primes in the second result list are out of order. (Recall that, in the second method, to apply concurrency we specified splitting the tasks into different groups to be executed simultaneously, and the order of the results we obtained is the order in which each task finished being executed.) This is a direct result of using concurrency in our second method: we split the tasks to be executed by the program into different groups, and our program processed the tasks in these groups at the same time.

Since tasks across different groups were executed simultaneously, there were tasks that were behind other tasks in the input list, and yet were executed before those other tasks. For example, the number 10000000000183 was behind the number 10000000000129 in our input list, but was processed prior to, and therefore in front of, the number 10000000000129 in our output list. In fact, if you execute the program again and again, the second result will vary in almost every run.

Evidently, this situation is not desirable if the result we'd like to obtain needs to be in the order of the input we originally had. Of course, in this example, we can simply modify the result by using some form of sorting, but it will cost us extra execution time in the end, which might make it even more expensive than the original sequential approach.

A concept that is commonly used to illustrate the innate sequentiality of some tasks is pregnancy: the number of women will never reduce the length of pregnancy. As opposed to parallel or concurrent tasks, where an increase in the number of processing entities will improve the execution time, adding more processors in inherently sequential tasks will not. Famous examples of inherent sequentiality include iterative algorithms: Newton's method, iterative solutions to the three-body problem, or iterative numerical approximation methods.

Example 2 – inherently sequential tasks

Let us consider a quick example:

Computing f1000(3), with f(x) = x2 - x + 1, and fn + 1(x) = f(fn(x)).

With complicated functions like f (where it is relatively difficult to find a general form of fn(x)), the only obviously reasonable way to compute f1000(3) or similar values is to iteratively compute f2(3) = f( f(3)), f3(3) = f( f2(3)), ... , f999(3) = f( f998(3)), and, finally, f1000(3) = f( f999(3)).

Since it will take significant time to actually compute f1000(3), even when using a computer, we will only consider f20(3) in our code (my laptop actually started heating up after f25(3)):

# Chapter01/example2.py

def f(x):
return x * x - x + 1

# sequential
def f(x):
return x * x - x + 1

start = timer()
result = 3
for i in range(20):
result = f(result)

print('Result is very large. Only printing the last 5 digits:', result % 100000)
print('Sequential took: %.2f seconds.' % (timer() - start))

Run it (or use python example2.py); the following code shows the output I received:

> python example2.py
Result is very large. Only printing the last 5 digits: 35443
Sequential took: 0.10 seconds.

Now, if we were to attempt to apply concurrency to this script, the only possible way would be through a for loop. One solution might be as follows:

# Chapter01/example2.py

# concurrent
def concurrent_f(x):
global result
result = f(result)

result = 3

with concurrent.futures.ThreadPoolExecutor(max_workers=20) as exector:
futures = [exector.submit(concurrent_f, i) for i in range(20)]

_ = concurrent.futures.as_completed(futures)

print('Result is very large. Only printing the last 5 digits:', result % 100000)
print('Concurrent took: %.2f seconds.' % (timer() - start))

The output I received is shown as follows:

> python example2.py
Result is very large. Only printing the last 5 digits: 35443
Concurrent took: 0.19 seconds.

Even though both methods produced the same result, the concurrent method took almost twice as long as the sequential method. This is due to the fact that every time a new thread (from ThreadPoolExecutor) was spawned, the function conconcurrent_f(), inside that thread, needed to wait for the variable result to be processed by the previous thread completely, and the program as a whole was thus executed in a sequential manner, nonetheless.

So, while there was no actual concurrency involved in the second method, the overhead cost of spawning new threads contributed to the significantly worse execution time. This is one example of inherently sequential tasks, where concurrency or parallelism should not be applied to attempt an improvement in execution time.

I/O bound

Another way to think about sequentiality is the concept (in computer science) of a condition called I/O bound, in which the time it takes to complete a computation is mainly determined by the time spent waiting for input/output (I/O) operations to be completed. This condition arises when the rate at which data is requested is slower than the rate at which it is consumed, or, in short, more time is spent requesting data than processing it.

In an I/O bound state, the CPU must stall its operation, waiting for data to be processed. This means that, even if the CPU gets faster at processing data, processes tend to not increase in speed in proportion to the increased CPU speed, since they get more I/O-bound. With faster computation speed being the primary goal of new computer and processor designs, I/O bound states are becoming undesirable, yet more and more common, in programs.

As you have seen, there are a number of situations in which the application of concurrent programming results in decreased processing speed, and they should thus be avoided. It is therefore important for us to not see concurrency as a golden ticket that can produce unconditionally better execution times, and to understand the differences between the structures of programs that benefit from concurrency and programs that do not.

You have been reading a chapter from
Mastering Concurrency in Python
Published in: Nov 2018
Publisher: Packt
ISBN-13: 9781789343052
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime