Sequential Execution: Crawling of Links
Learn about the sequential crawling of links using a sequential asynchronous iteration algorithm.
We'll cover the following...
Sequential crawling of links
Now, we can create the core of this new version of our web spider application, the spiderLinks()
function, which downloads all the links of an HTML page using a sequential asynchronous iteration algorithm. Pay attention to the way we’re going to define that in the following code block:
Press + to interact
function spiderLinks (currentUrl, body, nesting, cb) {if (nesting === 0) {return process.nextTick(cb)}const links = getPageLinks(currentUrl, body) //(1)if (links.length === 0) {return process.nextTick(cb)}function iterate (index) { //(2)if (index === links.length) {return cb()}spider(links[index], nesting - 1, function (err) { //(3)if (err) {return cb(err)}iterate(index + 1)})}iterate(0) //(4)}
The important steps to understand this new function are as follows:
We obtain the ...