...

/

Solution Review: Scrape the Web Page Using Beautiful Soup

Solution Review: Scrape the Web Page Using Beautiful Soup

Review the solution for the book information scraping task.

We'll cover the following...

Solution

We start by inspecting the web page and finding the elements we want.

Press + to interact
Inspecting the DOM of the first page
Inspecting the DOM of the first page
Press + to interact
import requests
from requests.compat import urljoin
from bs4 import BeautifulSoup
base_url = "https://books.toscrape.com/"
titles = []
images = []
rates = []
prices = []
# Solution
response = requests.get(base_url)
soup = BeautifulSoup(response.content, 'html.parser')
articles = soup.find_all("article", {"class":"product_pod"})
for article in articles:
image = urljoin(base_url,
article.find("div", {"class":"image_container"}).a.img['src'])
rate = article.find("p", {"class":"star-rating"})['class'][1]
title = article.find("h3").a['title']
price = article.find("div", {"class":"product_price"}).p.string
titles.append(title)
images.append(image)
rates.append(rate)
prices.append(price)
print("Length of scraped titles: ", len(titles))
print("Length of scraped images: ", len(images))
print("Length of scraped rates: ", len(rates))
print("Length of scraped prices: ", len(prices))
print(titles)

Code explanation

  • Lines 13–14: We request the site URL using request.get() and pass the response.content to BeautifulSoup(). ...