Detailed Design of a Many-core System
Learn the detailed design of a many-core key-value store.
Now, we will look into the detailed design of the solution to the problems we have while implementing a key-value store on a many-core system. We'll be discussing the solution to these three problems:
Memory limitation of TilePro64
Limitations of a multi-threaded Memcached
Allocation of tasks to cores
Solution: Limitations of memory and Memcached
Our first challenge is that TilePro64 has a 32-bit instruction set, and we cannot assign more than 4GB of virtual address space to a single process. The second challenge identified how the use of global locks hurt the performance of our key-value store. Both problems can be solved by implementing a version of Memcached that supports multiple processes:
Use multiple processes to access the key-value store in their own address space and overcome the memory limitation.
Use
(a direct consequence of using multiple processes) to parallelize data access and stop the use of locks within a shard.data sharding Breaking data into smaller chunks.
Use multiple processes
When using domain-specific processors, we run into unique problems like the TilePro64 having only a 32-bit virtual address space. So, rather than using a single process that uses multiple threads, we will use those threads to communicate with independent processes that contain the key-value data shard in their own dedicated address space. This will allow us to overcome the limitation of the 32-bit virtual address space, and each process will run its operations in serial while also allowing us to avoid inter-thread synchronizations.
Modifying Memcached
We have a multithreaded key-value store which unfortunately needs locks. Locks limit parallel processing in our system. As we allow each process to have its own dedicated address space to overcome the 4GB memory limit, it also allows us to stop using any locking protection. This is because all requests directed to a hashtable are handled by a single process, which is already serialized.
Flow of requests
After implementing the new multi-process version of Memcached, the following is the new flow of requests:
A network core receives a request and passes it to a worker thread.
The worker thread uses modulus arithmetic to identify the relevant shard process.
The worker thread writes the request's information into a shared memory, which is both accessible by the thread and the process. Shared memory is used to store request data until its process becomes available to process it.
Next, the worker thread notifies the shard process to
STORE
orGET
request.The shard process can then directly send the reply back to the client.
Level up your interview prep. Join Educative to access 70+ hands-on prep courses.