Multiprocessing Pools
Learn about the significance of interprocess communication, data serialization, and efficient multiprocessing in Python.
Overview
Because each process is kept meticulously separate by the operating system, interprocess communication becomes an important consideration. We need to
pass data between these separate processes. One really common example is having one process write a file that another process can read. When the two processes
are reading and writing a file, and running concurrently, we have to be sure the reader is waiting for the writer to produce data. The operating system pipe structure can accomplish this. Within the shell, we can write ps -ef | grep python
and
pass output from the ps
command to the grep
command. The two commands run concurrently. For Windows PowerShell users, there are similar kinds of pipeline processing, using different command names.
Data serialization
The multiprocessing
package provides some additional ways to implement interprocess communication. Pools can seamlessly hide the way data is moved between processes. Using a pool looks much like a function call: you pass data into a function, it is executed in another process or processes, and when the work is done, a value is returned. It is important to understand how much work is being done to support this: objects in one process are pickled and passed into an operating system process pipe. Then, another ...