Scaling Across CPUs
Learn to scale applications across CPUs.
Multithreading
Scaling across processors is usually done using multithreading. Multithreading is the ability to run code in parallel using threads. Threads are usually provided by the operating system and are contained in a single process. The operating system is responsible for scheduling their execution.
Since they run in parallel, that means they can be executed on separate processors even if they are contained in a single process. However, if only one CPU is available, the code is split up and run sequentially.
Therefore, when writing a multithreaded application, the code always runs concurrently but runs in parallel only if there is more than one CPU available.
This means that multithreading looks like a good way to scale and parallelize your application on one computer. When you want to spread the workload, you start a new thread for each new request instead of handling them one at a time.
Drawbacks of multithreading
However, this does have several drawbacks in Python. If you have been in the Python world for a long time, you have probably encountered the word GIL, and know how hated it is. The GIL is the Python global interpreter lock, a lock that must be acquired each time CPython needs to execute byte-code. Unfortunately, this means that if you try to scale your application by making it run multiple threads, this global lock always limits the performance of your code, as there are many conflicting demands. All your threads try to grab it as soon as they need to execute Python instructions.
The reason that the GIL is required in the first place is that it makes sure that some basic Python objects are thread-safe. For example, the code in the following example would not be thread-safe without the global Python lock.
import threadingx = []def append_two(l):l.append(2)threading.Thread(target=append_two, args=(x,)).start()x.append(1)print(x)
That code prints either [2, 1] or [1, 2] no matter what. While there is no way to know which thread appends 1 or 2 before the other, there is an assumption built into Python that each list.append
operation is atomic. If it was not atomic, a memory corruption might arise and the list could simply contain [1] or [2].
This phenomenon happens because only one thread is allowed to execute a bytecode instruction at a time. That also means that if your threads run a lot of bytecodes, there are many contentions to acquire the GIL, and therefore your program cannot be faster than a single-threaded version. It could even be slower.
Thread-safe operations
The easiest way to know if an operation is thread-safe is to know if it translates to a single
So, while using threads seems like an ideal solution at first glance, most applications I have seen running using multiple threads struggle to attain 150% CPU usage. That is to say, 1.5 cores are used. With computing nodes nowadays usually not having less than four or eight cores, it is a shame. Blame the GIL.
Removing GIL?
There is currently an effort underway (named gilectomy) to remove the GIL in CPython. Whether this effort will pay off is still unknown, but it is exciting to follow and see how far it will go.
However, CPython is just one, although the most common, of the available Python implementations. Jython, for example, doesn’t have a global interpreter lock, which means that it can run multiple threads in parallel efficiently. Unfortunately, these projects by their very nature lag behind CPython, and so they are not useful targets.
Global variables - an infinite source of human errors
Multithreading involves several traps, and one of them is that all the pieces of code running concurrently are sharing the same global environment and variables. Reading or writing global variables should be done exclusively by using techniques such as locking, which complicates your code. Moreover, it is an infinite source of human errors.
Getting multi-threaded applications right is hard. The level of complexity means that it is a large source of bugs. Considering the little to be gained in general, it is better not to waste too much effort on it.
Multiple processes
So, are we back to our initial use cases, with no real solutions on offer? Not true! There’s another solution you can use: using multiple processes. Doing this is going to be more efficient and easier as we will see in this lesson. It is also the first step before spreading across a network.