HTML parser, is used to parse HTML files. It treats HTML code as a string. HTML parser is used to define a class called HTMLParser
. One of main advantages of HTMLParser is
HTMLparser.close() is one the methods of HTMLparser
. By default, we use this method to force process the buffer of any unprocessed data. Any missing tags, comments, or text become part of the data.
HTMLparser.close()
can also be redefined in theHTMLparser
class to include any more functionalities.
HTMLParser.close()
This code below shows how we can use HTML parser to separate start tags, end tags, comments, and data from the HTML string. We can see how using the code differs with and without the use of HTMLparser.close
.
First, we can see how the code performs without the use of HTMLparser.close()
.
from html.parser import HTMLParserclass Parser(HTMLParser):# method to append the start tag to the list start_tags.def handle_starttag(self, tag, attrs):global start_tagsstart_tags.append(tag)# method to append the end tag to the list end_tags.def handle_endtag(self, tag):global end_tagsend_tags.append(tag)# method to append the data between the tags to the list all_data.def handle_data(self, data):global all_dataall_data.append(data)# method to append the comment to the list comments.def handle_comment(self, data):global commentscomments.append(data)start_tags = []end_tags = []all_data = []comments = []# Creating an instance of our class.parser = Parser()# Poviding the input.parser.feed('<html><title>Desserts</title><body><p>''I am a fan of frozen yoghurt.</p><')# We can see the input is incomplete. This puts all the# incomplete data in the buffer and waits for next input.print("start tags:", start_tags)print("end tags:", end_tags)print("data:", all_data)print("comments", comments)# Now we feed more data. This joins the new data with# previous incomplete data.parser.feed('/body><!--My first webpage--></html>')print("")print("After next input:")print("start tags:", start_tags)print("end tags:", end_tags)print("data:", all_data)print("comments", comments)
Now, let’s use HTMLparser.close()
.
Since we have used
close
, the buffer processes the unprocessed and makes it part of theall_data
array.
from html.parser import HTMLParserclass Parser(HTMLParser):# method to append the start tag to the list start_tags.def handle_starttag(self, tag, attrs):global start_tagsstart_tags.append(tag)# method to append the end tag to the list end_tags.def handle_endtag(self, tag):global end_tagsend_tags.append(tag)# method to append the data between the tags to the list all_data.def handle_data(self, data):global all_dataall_data.append(data)# method to append the comment to the list comments.def handle_comment(self, data):global commentscomments.append(data)start_tags = []end_tags = []all_data = []comments = []# Creating an instance of our class.parser = Parser()# Poviding the input.parser.feed('<html><title>Desserts</title><body><p>''I am a fan of frozen yoghurt.</p><')# We can see the input is incomplete. This puts all the# incomplete data in the buffer and waits for next input.print("start tags:", start_tags)print("end tags:", end_tags)print("data:", all_data)print("comments", comments)## Now we make use of close. This will force processing# of all the data in the buffer. The last# '<' becomes a part of data array.parser.close()print("")print("After use of close:")print("start tags:", start_tags)print("end tags:", end_tags)print("data:", all_data)print("comments", comments)
Free Resources