Web scraping is a method of extracting data from web sites. It uses software to extract all the information available from the targeted site by simulating human behavior.
Beautiful Soup is a Python library that makes it easy to scrape information from web pages. It sits atop an HTML or XML parser and provides Pythonic idioms for iterating, searching, and modifying the parse tree.
The Beautiful Soup library helps with isolating titles and links from webpages. It can extract all of the text from HTML tags, and alter the HTML in the document with which we’re working.
Some key features that make beautiful soup unique are:
lxml
and html5lib
, which allows us to try out different parsing strategies or trade speed for flexibility.More details can be found on the official website.