...

/

Sequential Execution: Crawling of Links

Sequential Execution: Crawling of Links

Learn about the sequential crawling of links using a sequential asynchronous iteration algorithm.

We'll cover the following...

Sequential crawling of links

Now, we can create the core of this new version of our web spider application, the spiderLinks() function, which downloads all the links of an HTML page using a sequential asynchronous iteration algorithm. Pay attention to the way we’re going to define that in the following code block:

Press + to interact
function spiderLinks (currentUrl, body, nesting, cb) {
if (nesting === 0) {
return process.nextTick(cb)
}
const links = getPageLinks(currentUrl, body) //(1)
if (links.length === 0) {
return process.nextTick(cb)
}
function iterate (index) { //(2)
if (index === links.length) {
return cb()
}
spider(links[index], nesting - 1, function (err) { //(3)
if (err) {
return cb(err)
}
iterate(index + 1)
})
}
iterate(0) //(4)
}

The important steps to understand this new function are as follows:

  1. We obtain the ...