Limited Parallel Execution
Learn about limited parallel execution with limiting concurrency.
We'll cover the following
Spawning parallel tasks without control can often lead to excessive load. Imagine having thousands of files to read, URLs to access, or database queries to run in parallel. A common problem in such situations is running out of resources. The most common example is when an application tries to open too many files at once, utilizing all the file descriptors available to the process.
A server that spawns unbounded parallel tasks to handle a user request could be exploited with a denial-of-service (DoS) attack. That’s when a malicious actor can forge one or more requests to push the server to consume all the available resources and become unresponsive. Limiting the number of parallel tasks is, in general, a good practice that helps build resilient applications.
Version 3 of our web spider doesn’t limit the number of parallel tasks and therefore, it’s susceptible to crashing in a number of cases. For instance, if we try to run it against a significantly big website, we might see it running for a few seconds and then failing with the ECONNREFUSED
error code. When we’re downloading too many pages concurrently from a web server, the server might decide to start rejecting new connections from the same IP. In this case, our spider will just crash and we’ll be forced to relaunch the process if we want to continue crawling the website. We can just handle ECONNREFUSED
to stop the process from crashing, but we’ll still be risking allocating too many parallel tasks and might run into other issues.
In this section, we’ll see how we can make our spider more resilient by keeping the concurrency limited.
The following illustration shows a situation where we have five tasks that run in parallel with a concurrency limit of two.
Get hands-on with 1300+ tech skills courses.