What is html.escape() in Python?

When storing raw HTML in databases or variables, we need to escape special characters that are not markup text but might be confused as such.

These characters include <, >, ", ', and &.

If not escaped, these characters may lead the browser to display a web page incorrectly. For example, the following text in HTML contains quotation marks around “Edpresso shots” that could confuse the end and opening of a new string.

I love reading "Edpresso shots".

HTML provides special entity names and entity numbers which are essentially escape sequences that replace these characters. Escape sequences in HTML always start with an ampersand and end with a semicolon.

Provided below is a table of special characters that HTML 4 suggests to escape and their respective entity names and entity numbers:

Character

Entity name

Entity number

>

&gt;

&#62;

<

&lt;

&#60;

"

&quot;

&#34;

&

&amp;

&#38;

To escape these characters, we can use the html.escape() method in Python to encode your HTML in ascii string. html.escape() takes HTML script as an argument, as well as one optional argument quote that is set to True by default. To use html.escape(), you need to import the html module that comes with Python 3.2 and above. Here is how you would use this method in code:

Example

import html
myHtml = """& < " ' >"""
encodedHtml = html.escape(myHtml)
print(encodedHtml)
encodedHtml = html.escape(myHtml, quote=False)
print(encodedHtml)

First, import the html module. Pass your HTML script to the html.escape() function and it will return you the encoded version of your HTML script. If you do not wish to escape the quotes, you can set the quote flag to False.

Copyright ©2024 Educative, Inc. All rights reserved