...

/

Fetching Text from a Website

Fetching Text from a Website

Develop the web scraping and data saving program.

We'll cover the following...

These are the libraries that we will be using.

import requests
from bs4 import BeautifulSoup
import openpyxl

Getting text

The first function you want to create will accept a website address (URL) as an argument and return the text (the code as str) of the website. This is a good, neutral function that can be included in any other program you write because it is agnostic to the URL.

Press + to interact
def scrape_website(address: str) -> str:
"""
Scrape the properties website and return the response text
:param address: URL of website to scrape
:return: str as response.text
"""
headers = {'user-agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:74.0) Gecko/20100101 Firefox/74.0"}
r = requests.get(address, headers=headers)
return r.text

Headers are what your browser sends along with its request to access a webpage. The user-agent defines what type of computer is making the request. Because requests without a user-agent are very obviously robots, it can be good practice to include your normal user-agent to show that you mean no harm. The easiest way to find your browser’s user-agent is to type “What is my user agent?” into a search engine. Requests will still work if you just get() the URL, but ...