Web Scraping using Beautiful Soup

Learn how to scrape Github data using Beautiful Soup.

Scrape GitHub data

Create a function to get the data:

def getData(userName):
   pass

The URL https://github.com/{user_name}?tab=repositories contains the user’s information and their recent public repositories. We’ll use the requests library to get the contents of the page.

Let’s run the following code to scrape the user’s data:

import requests
userName = input("Enter Github user name:  ")
url = "https://github.com/{}?tab=repositories".format(userName)
page = requests.get(url)
decoded = page.content.decode("utf-8") # Converting content into HTML
# Creating and saving html file
f = open("index.html",'w')
f.write(decoded)
f.close()
Displaying a repository's content

Next, we create an instance of BeautifulSoup and pass page.content as the parameter. We create an empty dictionary to store the user information.

soup = BeautifulSoup(page.content , 'html.parser')
info = {}

We’ll scrape the following information: Full name

  • Image
  • Number of followers
  • Number of users following
  • Location (if it exists)
  • Portfolio URL (if it exists)
  • Repo name, repo link, repo last update, repo programming language, repo description

Full name

The full name is inside an element ...