Search⌘ K
AI Features

Sequential Execution: Crawling of Links

Explore the implementation of sequential asynchronous link crawling in Node.js. Understand how to obtain and iterate over internal page links using a controlled callback-based approach. This lesson demonstrates building a recursive web spider that processes links one at a time, teaching safe and efficient asynchronous iteration techniques.

We'll cover the following...

Sequential crawling of links

Now, we can create the core of this new version of our web spider application, the spiderLinks() function, which downloads all the links of an HTML page using a sequential asynchronous iteration algorithm. Pay attention to the way we’re going to define that in the following code block:

Node.js
function spiderLinks (currentUrl, body, nesting, cb) {
if (nesting === 0) {
return process.nextTick(cb)
}
const links = getPageLinks(currentUrl, body) //(1)
if (links.length === 0) {
return process.nextTick(cb)
}
function iterate (index) { //(2)
if (index === links.length) {
return cb()
}
spider(links[index], nesting - 1, function (err) { //(3)
if (err) {
return cb(err)
}
iterate(index + 1)
})
}
iterate(0) //(4)
}

The important steps to understand this new function are as follows:

  1. We obtain the ...