...
/Web Scraping using Beautiful Soup
Web Scraping using Beautiful Soup
Learn how to scrape Github data using Beautiful Soup.
Scrape GitHub data
Create a function to get the data:
def getData(userName):
pass
The URL https://github.com/{user_name}?tab=repositories
contains the user’s information and their recent public repositories. We’ll use the requests
library to get the contents of the page.
Let’s run the following code to scrape the user’s data:
import requests userName = input("Enter Github user name: ") url = "https://github.com/{}?tab=repositories".format(userName) page = requests.get(url) decoded = page.content.decode("utf-8") # Converting content into HTML # Creating and saving html file f = open("index.html",'w') f.write(decoded) f.close()
Displaying a repository's content
Next, we create an instance of BeautifulSoup
and pass page.content
as the parameter. We create an empty dictionary to store the user information.
soup = BeautifulSoup(page.content , 'html.parser')
info = {}
We’ll scrape the following information: Full name
- Image
- Number of followers
- Number of users following
- Location (if it exists)
- Portfolio URL (if it exists)
- Repo name, repo link, repo last update, repo programming language, repo description
Full name
The full name is inside an element ...