Search icon CANCEL
Subscription
0
Cart icon
Cart
Close icon
You have no products in your basket yet
Save more on your purchases!
Savings automatically calculated. No voucher code required
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Distributed Computing with Python

You're reading from  Distributed Computing with Python

Product type Book
Published in Apr 2016
Publisher
ISBN-13 9781785889691
Pages 170 pages
Edition 1st Edition
Languages
Toc

Table of Contents (15) Chapters close

Distributed Computing with Python
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface
1. An Introduction to Parallel and Distributed Computing 2. Asynchronous Programming 3. Parallelism in Python 4. Distributed Applications – with Celery 5. Python in the Cloud 6. Python on an HPC Cluster 7. Testing and Debugging Distributed Applications 8. The Road Ahead Index

Parallel computing


Definitions of parallel computing abound. However, for the purpose of this book, a simple definition will suffice, which is as follows:

Parallel computing is the simultaneous use of more than one processor to solve a problem.

Typically, this definition is further specialized by requiring that the processors reside on the same motherboard. This is mostly to distinguish parallel computing from distributed computing (which is discussed in the next section).

The idea of splitting work among many workers is as old as human civilization, is not restricted to the digital world, and finds an immediate and obvious application in modern computers equipped with higher and higher numbers of compute units.

There are, of course, many reasons why parallel computing might be useful and even necessary. The simplest one is performance; if we can indeed break up a long-running computation into smaller chunks and parcel them out to different processors, then we can do more work in the same amount of time.

Other times, and just as often, parallel computing techniques are used to present users with responsive interfaces while the system is busy with some other task. Remember that one processor executes just one task at the time. Applications with GUIs need to offload work to a separate thread of execution running on another processor so that one processor is free to update the GUI and respond to user inputs.

The following figure illustrates this common architecture, where the main thread is processing user and system inputs using what is called an event loop. Tasks that require a long time to execute and those that would otherwise block the GUI are offloaded to a background or worker thread:

A simple real-world example of this parallel architecture could be a photo organization application. When we connect a digital camera or a smartphone to our computers, the photo application needs to perform a number of actions; all the while its user interface needs to stay interactive. For instance, our application needs to copy images from the device to the internal disk, create thumbnails, extract metadata (for example, date and time of the shot), index the images, and finally update the image gallery. While all of this happens, we are still able to browse images that are already imported, open them, edit them, and so on.

Of course, all these actions could very well be performed sequentially on a single processor—the same processor that is handling the GUI. The drawback would be a sluggish interface and an extremely slow overall application. Performing these steps in parallel keeps the application snappy and its users happy.

The astute reader might jump up at this point and rightfully point out that older computers, with a single processor and a single core, could already perform multiple things at the same time (by way of multitasking). What happened back then (and even today, when we launch more tasks than there are processors and cores on our computers) was that the one running task gave up the CPU (either voluntarily or forcibly by the OS, for example, in response to an IO event) so that another task could run in its place. These interrupts would happen over and over again, with various tasks acquiring and giving up the CPU many times over the course of the application's life. In those cases, users had the impression of multiple tasks running concurrently, as the switches were extremely fast. In reality, however, only one task was running at any given time.

The typical tools used in parallel applications are threads. On systems such as Python (as we will see in Chapter 3, Parallelism in Python) where threads have significant limitations, programmers resort to launching (oftentimes, by means of forking) subprocesses instead. These subprocesses replace (or complement) threads and run alongside the main application process.

The first technique is called multithreaded programming. The second is called multiprocessing. It is worth noting that multiprocessing should not be seen as inferior or as a workaround with respect to using multiple threads.

There are many situations where multiprocessing is preferable to multiple threads. Interestingly, even though they both run on a single computer, a multithreaded application is an example of shared-memory architecture, whereas a multiprocess application is an example of distributed memory architecture (refer to the following section to know more).

You have been reading a chapter from
Distributed Computing with Python
Published in: Apr 2016 Publisher: ISBN-13: 9781785889691
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime