Converting HTML escape sequences in a string

| 8 Comments

Someone asked a question about how to convert a string with HTML escape sequences (like: &lt;). to their unescaped equivalents (like '<'). Here's a quick function to do that for you...

function convertFromHTML(the_string) { createTextField("converter_txt", getNextHighestDepth(), 0, 0, 0, 0); converter_txt.html = true; converter_txt.htmlText = the_string; // store the text so it doesn't get lost when we remove the // text field var txt = converter_txt.text; converter_txt.removeTextField(); // clean up return txt; } // usage: trace(convertFromHTML("This is some <html> escaped text & stuff"));

WARNING: This was just thrown together and isn't thoroughly tested so use at your own risk. Additionally it requires Flash Player 7 because of the getNextHighestDepth() call. You can remove that call and replace it with a numeric depth, but be careful of what depth you select since convert_txt will overwrite whatever is on that depth.

The algorithm is simple - make an html-enabled text field and put the html in it, then read the text field's text property. The text property is the normal, unformatted text without html tags regardless of whether the text field is html-enabled or not.

Enjoy!

8 Comments

  • With all honesty, this is why I wished inheriting Class properties and method worked rather than have to create the instance of the TextField.

    For example, this should work too (but doesn't):
    function convertFromHTML(p_str:String):String {
    var temp_txt:TextField = new TextField();
    temp_txt.html = true;
    temp_txt.htmlText = p_str;
    var txt = temp_txt.text;
    return txt;
    }
    trace(convertFromHTML("Hello <p> and &"));

     
  • Yeah, I know what you mean. You can make an instance of TextField with the new operator, but all of the properties default to undefined:

    t = new TextField();
    for (var i in t) {
    trace(t[i]);
    }

    However, if you make an instance with createTextField, everything gets values correctly

    createTextField("t", 1, 0, 0, 0, 0);
    for (var i in t) {
    trace(t[i]);
    }


    It's a little counter-intuitive, but I agree that it would be nice if it works like you mentioned.

     
  • hmm, why would you go to all the trouble of creating a texfield to do this when it can be done with simple String methods....

    public static function unescapeHTML(str:String){
    return (((str.split(">")).join(">")).split("<")).join("<");
    }

     
  • The html formatting screwed the code a bit...

    The splits are on the escaped &gt; and &lt; and the joins are on > and <

     
  • Blackmambe: What about ', ", &, Æ etc...

    The point is, there's a bunch of escape sequences (see: http://www.html-reference.com/Escape.htm), and having to split and join would require maybe 40 different lines of code.

    Not to mention, everytime you do a split, the entire string has to be looked through.. so that would be 40 times parsing the string, when in reality we should only need to look through the string once and replace characters as we find them.

    Letting the TextField do the work ensures that we do not miss any of the escape sequences, and let's the work be performed faster than dealing with string manipulation through code.

     
  • clever stuff

     
  • I always use unescape to conver HTML escape sequences in a string.

     
  • @colin: The unescape function is for converting URL-encoded values. To convert HTML-escaped values you need to use a function like mine, above.

    URL-encoded values start with a percent sign (%), HTML-escaped values start with an ampersand (&). They are both used for different purposes -- URL-encoded values are necessary because the URL string is limited to a certain number of characters, HTML-escaping is necessary because characters like have special meaning in HTML. You can find more information from google.

     

Leave a comment

Flex.org - The Directory for Flex

Archives