Search⌘ K

Multiprocessing Pools

Explore how to use Python's multiprocessing pools to run functions in parallel across multiple CPU cores. Understand data serialization, process communication, and methods like map and apply_async to handle asynchronous and dynamic job submissions for concurrent programming.

Overview

Because each process is kept meticulously separate by the operating system, interprocess communication becomes an important consideration. We need to pass data between these separate processes. One really common example is having one process write a file that another process can read. When the two processes are reading and writing a file, and running concurrently, we have to be sure the reader is waiting for the writer to produce data. The operating system pipe structure can accomplish this. Within the shell, we can write ps -ef | grep python and pass output from the ps command to the grep command. The two commands run concurrently. For Windows PowerShell users, there are similar kinds of pipeline processing, using different command names.

Data serialization

The multiprocessing package provides some additional ways to implement interprocess communication. Pools can seamlessly hide the way data is moved between processes. Using a pool looks much like a function call: you pass data into a function, it is executed in another process or processes, and when the work is done, a value is returned. It is important to understand how much work is being done to support this: objects in one process are pickled and passed into an operating system process pipe. Then, another ...