Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Distributed Computing with Python

You're reading from   Distributed Computing with Python Harness the power of multiple computers using Python through this fast-paced informative guide

Arrow left icon
Product type Paperback
Published in Apr 2016
Publisher
ISBN-13 9781785889691
Length 170 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Rasheedh B Rasheedh B
Author Profile Icon Rasheedh B
Rasheedh B
Arrow right icon
View More author details
Toc

Table of Contents (10) Chapters Close

Preface 1. An Introduction to Parallel and Distributed Computing FREE CHAPTER 2. Asynchronous Programming 3. Parallelism in Python 4. Distributed Applications – with Celery 5. Python in the Cloud 6. Python on an HPC Cluster 7. Testing and Debugging Distributed Applications 8. The Road Ahead Index

Distributed computing

For the remainder of this book, we will adopt the following working definition of distributed computing:

Distributed computing is the simultaneous use of more than one computer to solve a problem.

Typically, as in the case of parallel computing, this definition is oftentimes further restricted. The restriction usually is the requirement that these computers appear to their users as a single machine, therefore hiding the distributed nature of the application. In this book, we will be happy with the more general definition.

Distributing computation across multiple computers is again a pretty obvious strategy when using systems that are able to speak to each other over the (local or otherwise) network. In many respects, in fact, this is just a generalization of the concepts of parallel computing that we saw in the previous section.

Reasons to build distributed systems abound. Oftentimes, the reason is the ability to tackle a problem so big that no individual computer could handle it at all, or at least, not in a reasonable amount of time. An interesting example from a field that is probably familiar to most of us is the rendering of 3D animation movies, such as those from Pixar and DreamWorks.

Given the sheer number of frames to render for a full-length feature (30 frames per second on a two-hour movie is a lot!), movie studios need to spread the full-rendering job to large numbers of computers (computer farms as they are called).

Other times, the very nature of the application being developed requires a distributed system. This is the case, for instance, for instant messaging and video conferencing applications. For these pieces of software, performance is not the main driver. It is just that the problem that the application solves is itself distributed.

In the following figure, we see a very common web application architecture (another example of a distributed application), where multiple users connect to the website over the network. At the same time, the application itself communicates with systems (such as a database server) running on different machines in its LAN:

Distributed computing

Another interesting example of distributed systems, which might be a bit counterintuitive, is the CPU-GPU combination. These days, graphics cards are very sophisticated computers in their own right. They are highly parallel and offer compelling performance for a large number of compute-intensive problems, not just for displaying images on screen. Tools and libraries exist to allow programmers to make use of GPUs for general-purpose computing (for example CUDA from NVIDIA, OpenCL, and OpenAcc among others).

However, the system composed by the CPU and GPU is really an example of a distributed system, where the network is replaced by the PCI bus. Any application exploiting both the CPU and the GPU needs to take care of data movement between the two subsystems just like a more traditional application running across the network!

It is worth noting that, in general, adapting the existing code to run across computers on a network (or on the GPU) is far from a simple exercise. In these cases, I find it quite helpful to go through the intermediate step of using multiple processes on a single computer first (refer to the discussion in the previous section). Python, as we will see in Chapter 3, Parallelism in Python, has powerful facilities for doing just that (refer to the concurrent.futures module).

Once I evolve my application so that it uses multiple processes to perform operations in parallel, I start thinking about how to turn these processes into separate applications altogether, which are no longer part of my monolithic core.

Special attention must be given to the data—where to store it and how to access it. In simple cases, a shared filesystem (for example, NFS on Unix systems) is enough; other times, a database and/or a message bus is needed. We will see some concrete examples from Chapter 4, Distributed Applications – with Celery, onwards. It is important to remember that, more often than not, data, rather than CPU, is the real bottleneck.

You have been reading a chapter from
Distributed Computing with Python
Published in: Apr 2016
Publisher:
ISBN-13: 9781785889691
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image