Globally Limiting Concurrency

Learn about the role of queues and events in globally limiting concurrency.

Our web spider application is perfect for applying what we just learned about limiting the concurrency of a set of tasks. In fact, to avoid the situation in which we have thousands of links being crawled at the same time, we can enforce a limit on the concurrency of this process by adding some predictability regarding the number of concurrent downloads.

We can apply this implementation of the limited concurrency pattern to our spiderLinks() function, but by doing that, we would only be limiting the concurrency of tasks spawned from the links found within a given page. If we choose, for example, a concurrency of two, we’ll have, at most, two links downloaded in parallel for each page. However, as we can download multiple links at once, each page will then spawn another two downloads, causing the grand total of download operations to grow exponentially anyway.

In general, this implementation of the limited concurrency pattern works very well when we have a predetermined set of tasks to execute, or when the set of tasks grows linearly over time. When, instead, a task can spawn two or more tasks directly, as happens with our web spider, this implementation is not suitable for limiting the global concurrency.

Queues to the rescue

What we really want, then, is to limit the global number of download operations we can have running in parallel. We can slightly modify the pattern shown in the previous section, but that’s left as an exercise for you. Instead, let’s discuss another mechanism that makes use of queues to limit the concurrency of multiple tasks. Let’s see how this works.

We’re now going to implement a simple TaskQueue named class, which will combine a queue with the algorithm that was presented while discussing limited concurrency. Let’s create a new taskQueue.js named module.

Get hands-on with 1200+ tech skills courses.