Escaping HTML in Python

04 Oct 2008

The short answer is do this:

def html_escape(text):
    text = text.replace('&', '&')
    text = text.replace('"', '"')
    text = text.replace("'", ''')
    text = text.replace(">", '>')
    text = text.replace("<", '&lt;')
    return text

I know, I know it seems kinda "unsophisticated" and too simple. But I tell you it's the fastest way.

How not to do it

Elsewhere on the web, I found this lovely snippet:

html_escape_table = {
    "&": "&",
    '"': """,
    "'": "'",
    ">": ">",
    "<": "<",
    }

def html_escape_orig(text):
    parts = []
    for c in text:
        parts.append(html_escape_table.get(c,c))
    return ''.join(parts)

FAIL Python is not C. It's not so hot at character by character iteration. Also, let's say you have a 100 character input. Then you can going to make 100 calls to get, 100 calls to append and depending what the guts of python does, 100 mini one-character strings.

The performance? I whipped up some sample runs using cProfile. The first method using replace ran

6002 function calls in 0.040 CPU seconds

versus, the character by character method:

1184002 function calls in 6.331 CPU seconds

Yeah, it's about 200x faster

And one more thing

Poking around a bit further, you can do the same thing (although a touch slower) with:

from xml.sax.saxutils import escape
def html_escape(text):
    return escape(text, {'"', '"', "'":'''}

The code for escape is just a bunch of replace statements, and iteration through the dictionary to do other replacements


Comment 2008-12-07 by None

Thank you, quick and easy way to escape html :)


Comment 2009-06-04 by None

Or simply use the escape function in the cgi module.