Problems with Multiprocessing
Learn about the interprocess communication challenges and process independence advantages in Python’s multiprocessing.
Cost of interprocess communication
As with threads, multiprocessing also has problems, some of which we have already discussed. Sharing data between processes is costly. As we have discussed, all communication between processes, whether by queues, OS pipes, or even shared memory, requires serializing the objects. Excessive serialization can dominate processing time. Shared memory objects can help by limiting the serialization to the initial setup of the shared memory. Multiprocessing works best when relatively small objects are passed between processes and a tremendous amount of work needs to be done on each one.
Optimizing with shared memory
Using shared memory can avoid the cost of repeated serialization and deserialization. There are numerous limitations on the kinds of Python objects that can be shared. Shared memory can help performance, but can also lead to somewhat more complex-looking Python objects.
The other major problem with multiprocessing is that, like threads, it can be hard to tell which process a variable or method is being accessed in. In multiprocessing, the worker processes inherit a great deal of data from the parent process. This isn’t shared, it’s a one-time copy. A child can be given a copy of a mapping or a list and mutate the object. The parent won’t see the effects of the child’s mutation.
Advantages of process independence
A big advantage of multiprocessing is the absolute independence of processes. We don’t need to carefully manage locks, because the data is not shared. Additionally, the internal operating system limits on numbers of open files are allocated at the process level; we can have a large number of resource-intensive processes.
When designing concurrent applications, the focus is on maximizing the use of the CPU to do as much work in as short a time as possible. With so many choices, we always need to examine the problem to figure out which of the many available solutions is the best one for that problem.
Note: The notion of concurrent processing is too broad for there to be one right way to do it. Each distinct problem has a best solution. It’s important to write code in a way that permits adjustment, tuning, and optimization.
Threads vs. processes in Python concurrency
We’ve looked at the two principal tools for concurrency in Python: threads and processes. Threads exist within a single OS process, sharing memory and other resources. Processes are independent of each other, which makes interprocess communication a necessary overhead. Both of these approaches are amenable to the concept of a pool of concurrent workers waiting to work and providing results at some unpredictable time in the future. This abstraction of results becoming available in the future is what shapes the concurrent.futures
module.
Get hands-on with 1400+ tech skills courses.