In the last chapter, we introduced the concept of parallel processing and learned how to leverage multicore processors and GPUs. Now, we can step up our game a bit and turn our attention on distributed processing, which involves executing tasks across multiple machines to solve a certain problem.
In this chapter, we will illustrate the challenges, use cases, and examples of how to run code on a cluster of computers. Python offers easy-to-use and reliable packages for distribute processing, which will allow us to implement scalable and fault-tolerant code with relative ease.
The list of topics for this chapter is as follows:
- Distributed computing and the MapReduce model
- Directed Acyclic Graphs with Dask
- Writing parallel code with Dask's array, Bag, and DataFrame data structures
- Distributing parallel algorithms with Dask Distributed
- An introduction to PySpark
- Spark's Resilient Distributed...