Sequential Execution: Crawling of Links
Explore the implementation of sequential asynchronous link crawling in Node.js. Understand how to obtain and iterate over internal page links using a controlled callback-based approach. This lesson demonstrates building a recursive web spider that processes links one at a time, teaching safe and efficient asynchronous iteration techniques.
We'll cover the following...
We'll cover the following...
Sequential crawling of links
Now, we can create the core of this new version of our web spider application, the spiderLinks() function, which downloads all the links of an HTML page using a sequential asynchronous iteration algorithm. Pay attention to the way we’re going to define that in the following code block:
The important steps to understand this new function are as follows:
We obtain the ...