Scrapy
Let's explore how Scrapy is a great python framework to crawl websites.
We'll cover the following...
Overview
Scrapy is a framework that we can use for crawling websites and extracting (i.e. scraping) data. It can also be used to extract data via a website’s API or as a general-purpose web crawler. To install Scrapy, all we need is pip:
pip3 install scrapy
According to Scrapy’s documentation, we will also need lxml
and OpenSSL
installed.
To install lxml
. We will use the command:
pip3 install lxml
To install OpenSSL
. We will use the command:
pip3 install pyopenssl
We are going to use Scrapy to do the same thing that we used
BeautifulSoup
for, which was scraping the title and link of the articles
on my blog’s front page. To get started, all we need to do is open up a terminal and change directories to the one that we want to store our project in. Then run the following command:
scrapy startproject blog_scraper
This will create a directory named blog_scraper
in the current
directory which will contain the following items:
- Another nested
blog_scraper
folder