Extract Books Under All Categories

Learn how to scrape all books under all categories.

In this lesson, let’s scrape all the books on the “Books to Scrape” bookstore website. When we surf the website, we can observe that all the books are listed under different category names. We can use it to find a solution to scrape all the books.

Approach

We can implement this by scraping all the books under each category. So, the first step would be to extract the links to all the category pages. Once we have that information, we can navigate to each category page and extract the links to the book details page while paginating. Once we have the links to book details pages, we can navigate those pages and scrape the necessary data we want.

Implementing the approach

The code snippet below demonstrates the sample implementation for the approach we discussed. We have created separate files to modularize the code for easy understanding. Since there are many books on this website, we have only scraped the first 2 categories to complete the execution quickly. However, we can configure it to scrape more or less data.

  • extractCategoryLinks.js: In this file, we extract the links to each category page.

  • extractBookLinks.js: In this file, we extract the links for the book details page under the category.

  • extractBookDetails.js: In this file, we scrape book details and return from the book details page.

  • index.js: In this file, we initiate the execution and log the scraped data.

Run the code snippet show to see this in action. Note that this execution takes additional time compared to other executions since this script needs to go through multiple pages. If needed, you can stop the scraping process by pressing the “Ctrl + c” keys.

Get hands-on with 1200+ tech skills courses.