Packt+ | Advance your knowledge in tech

You're reading from Distributed Computing with Python

Product type Book

Published in Apr 2016

Publisher

ISBN-13 9781785889691

Pages 170 pages

Edition 1st Edition

Languages

Python

Concepts

Distributed Computing

Table of Contents (15) Chapters

Distributed Computing with Python

Credits

About the Author

About the Reviewer

www.PacktPub.com

Preface

1. An Introduction to Parallel and Distributed Computing

2. Asynchronous Programming

3. Parallelism in Python

4. Distributed Applications – with Celery

5. Python in the Cloud

6. Python on an HPC Cluster

7. Testing and Debugging Distributed Applications

8. The Road Ahead

Index

Parallel computing

Definitions of parallel computing abound. However, for the purpose of this book, a simple definition will suffice, which is as follows:

Parallel computing is the simultaneous use of more than one processor to solve a problem.

Typically, this definition is further specialized by requiring that the processors reside on the same motherboard. This is mostly to distinguish parallel computing from distributed computing (which is discussed in the next section).

The idea of splitting work among many workers is as old as human civilization, is not restricted to the digital world, and finds an immediate and obvious application in modern computers equipped with higher and higher numbers of compute units.

There are, of course, many reasons why parallel computing might be useful and even necessary. The simplest one is performance; if we can indeed break up a long-running computation into smaller chunks and parcel them out to different processors, then we can do more work in the same amount of time.

Other times, and just as often, parallel computing techniques are used to present users with responsive interfaces while the system is busy with some other task. Remember that one processor executes just one task at the time. Applications with GUIs need to offload work to a separate thread of execution running on another processor so that one processor is free to update the GUI and respond to user inputs.

The following figure illustrates this common architecture, where the main thread is processing user and system inputs using what is called an event loop. Tasks that require a long time to execute and those that would otherwise block the GUI are offloaded to a background or worker thread:

A simple real-world example of this parallel architecture could be a photo organization application. When we connect a digital camera or a smartphone to our computers, the photo application needs to perform a number of actions; all the while its user interface needs to stay interactive. For instance, our application needs to copy images from the device to the internal disk, create thumbnails, extract metadata (for example, date and time of the shot), index the images, and finally update the image gallery. While all of this happens, we are still able to browse images that are already imported, open them, edit them, and so on.

Of course, all these actions could very well be performed sequentially on a single processor—the same processor that is handling the GUI. The drawback would be a sluggish interface and an extremely slow overall application. Performing these steps in parallel keeps the application snappy and its users happy.

The astute reader might jump up at this point and rightfully point out that older computers, with a single processor and a single core, could already perform multiple things at the same time (by way of multitasking). What happened back then (and even today, when we launch more tasks than there are processors and cores on our computers) was that the one running task gave up the CPU (either voluntarily or forcibly by the OS, for example, in response to an IO event) so that another task could run in its place. These interrupts would happen over and over again, with various tasks acquiring and giving up the CPU many times over the course of the application's life. In those cases, users had the impression of multiple tasks running concurrently, as the switches were extremely fast. In reality, however, only one task was running at any given time.

The typical tools used in parallel applications are threads. On systems such as Python (as we will see in Chapter 3, Parallelism in Python) where threads have significant limitations, programmers resort to launching (oftentimes, by means of forking) subprocesses instead. These subprocesses replace (or complement) threads and run alongside the main application process.

The first technique is called multithreaded programming. The second is called multiprocessing. It is worth noting that multiprocessing should not be seen as inferior or as a workaround with respect to using multiple threads.

There are many situations where multiprocessing is preferable to multiple threads. Interestingly, even though they both run on a single computer, a multithreaded application is an example of shared-memory architecture, whereas a multiprocess application is an example of distributed memory architecture (refer to the following section to know more).

You're reading from Distributed Computing with Python

Table of Contents (15) Chapters close

Parallel computing

Personalised recommendations for you

Table of Contents (15) Chapters