This device is not compatible.
PROJECT
Scraping Wikipedia Using Selenium in Python
In this unguided project, we’ll scrape the Wikipedia website using different tools provided by the Selenium library in Python. We’ll master the techniques of fetching HTML data using multiple Selenium commands. Lastly, we’ll learn to automate the events on a web page.
You will learn to:
Understand the fundamentals of Selenium methods.
Automate the events on a web page using Selenium.
Use regex for text cleaning.
Create Python dictionaries from scraped data.
Skills
Web Scraping
Python Programming
HTML Elements
Prerequisites
Basic understanding of the Python language
Basic understanding of the Selenium library
Basic understanding of the Python regex library
Basic understanding of Python dictionaries
Technologies
CSS
HTML
Python
Selenium
Project Description
In this unguided project, we’ll use the Selenium library in Python to scrape data from Wikipedia, the fastest growing free online encyclopedia. Throughout this project, we’ll use multiple Selenium commands to fetch HTML elements using the following attributes:
Throughout this project, you’ll use multiple Selenium commands to fetch HTML elements. You’ll fetch elements using the following attributes:
- CSS class names
- CSS IDs
- HTML tag names
- Link texts
- Texts
- Nested CSS selectors
- Attributes
Furthermore, we’ll use multiple Selenium events to automate the processes on this website. Finally, we’ll use the regex
library to clean the text data.
Project Tasks
1
Initial Setup
Task 1: Get Started
Task 2: Navigate to the Web Page
2
Scrape the Data
Task 3: Perform the Search Operation
Task 4: Fetch an Element Using Link Text
Task 5: Fetch Elements Using the Tag Name
Task 6: Fetch Nested Elements
Congratulations!
Relevant Courses
Use the following content to review prerequisites or explore specific concepts in detail.