The short answer is do this:
def html_escape(text):
text = text.replace('&', '&')
text = text.replace('"', '"')
text = text.replace("'", ''')
text = text.replace(">", '>')
text = text.replace("<", '<')
return text
I know, I know it seems kinda "unsophisticated" and too simple. But I tell you it's the fastest way.
How not to do it
Elsewhere on the web, I found this lovely snippet:
html_escape_table = {
"&": "&",
'"': """,
"'": "'",
">": ">",
"<": "<",
}
def html_escape_orig(text):
parts = []
for c in text:
parts.append(html_escape_table.get(c,c))
return ''.join(parts)
FAIL Python is not C. It's not so hot at character by character iteration. Also, let's say you have a 100 character input. Then you can going to make 100 calls to get, 100 calls to append and depending what the guts of python does, 100 mini one-character strings.
The performance? I whipped up some sample runs using cProfile. The first method using replace ran
6002 function calls in 0.040 CPU seconds
versus, the character by character method:
1184002 function calls in 6.331 CPU seconds
Yeah, it's about 200x faster
And one more thing
Poking around a bit further, you can do the same thing (although a touch slower) with:
from xml.sax.saxutils import escape
def html_escape(text):
return escape(text, {'"', '"', "'":'''}
The code for escape is just a bunch of replace statements, and iteration through the dictionary to do other replacements
2 comments:
Thank you, quick and easy way to escape html :)
Or simply use the escape function in the cgi module.
Post a Comment