Scrapy

Let's explore how Scrapy is a great python framework to crawl websites.

Overview

Scrapy is a framework that we can use for crawling websites and extracting (i.e. scraping) data. It can also be used to extract data via a website’s API or as a general-purpose web crawler. To install Scrapy, all we need is pip:

Press + to interact
pip3 install scrapy

According to Scrapy’s documentation, we will also need lxml and OpenSSL installed.

To install lxml. We will use the command:

Press + to interact
pip3 install lxml

To install OpenSSL. We will use the command:

Press + to interact
pip3 install pyopenssl

We are going to use Scrapy to do the same thing that we used BeautifulSoup for, which was scraping the title and link of the articles on my blog’s front page. To get started, all we need to do is open up a terminal and change directories to the one that we want to store our project in. Then run the following command:

Press + to interact
scrapy startproject blog_scraper

This will create a directory named blog_scraper in the current directory which will contain the following items:

  • Another nested blog_scraper folder
...