Web Crawler Example
This lesson discusses a text-book example of a concurrent program - the web crawler.
We'll cover the following...
Web Crawler Example
From what we have learnt so far, asyncio is an excellent choice for blocking operations. Usually, there are two kinds of blocking operations:
network I/O
disk I/O
Let's start with implementing a simple web crawler. A web crawler is a program that systematically browses the world wide web, typically with the intent to index it. For our purposes, we'll dumb down our crawler and limit its capability to fetch the HTML for a list of URLs. The downloaded HTML is passed onto a consumer which then performs the indexing but we'll not implement that part.
The meat of the problem lies in asynchronously downloading the given URLs. We'll be using the aiohttp
module for asynchronous REST GET calls. If we were ...