Using multiprocessing pools and tasks
Concurrency is a form of non-strict evaluation: the exact order of operations is unpredictable. The multiprocessing
package introduces the concept of a Pool
object. A Pool
object contains a number of worker processes and expects these processes to be executed concurrently. This package allows OS scheduling and time slicing to interleave execution of multiple processes. The intention is to keep the overall system as busy as possible.
To make the most of this capability, we need to decompose our application into components for which non-strict concurrent execution is beneficial. The overall application must be built from discrete tasks that can be processed in an indefinite order.
An application that gathers data from the internet through web scraping, for example, is often optimized through parallel processing. We can create a Pool
object of several identical workers, which implement the website scraping. Each worker is assigned tasks in the form of URLs...