Preventing RSS exploits with Radio

I'm working on a Radio script that addresses RSS exploits.

Mark Pilgrim's suggestion of weeding out the unsafe HTML seems futile. Instead, the script removes all HTML tags and attributes other than a small subset that can't be abused: P, B, I, BR, and BLOCKQUOTE (all without attributes), A (with HREF only), and IMG (with SRC, ALT, HEIGHT, and WIDTH only). I'm hoping the script also has the side benefit of making RSS entries easier to read.

The script works on the text of entries, but I can't find a way to make it work with the storyArrived callback. If anyone has tackled this problem before, I've begun a discussion on the radio-dev mailing list.

Comments

What about strong, em, code, kbd, ul, ol, li, h[1-6], div, abbr, del,...? And the longdesc or title attributes? The main reason I don't like "safe subsets" are that people usually exclude perfectly safe tags for no apparent reason. I really don't wnat to have to use deprecated tags like tt and s when there are superior replacements.

The way the code is written, users will be able to define safe tags and attributes on their own. I chose my subset simply because they're the same ones I use on some PHP message board software. Out of the ones you've listed, the most glaring omissions are OL, UL, and LI.

Add a Comment

All comments are moderated before publication. These HTML tags are permitted: <p>, <b>, <i>, <a>, and <blockquote>. This site is protected by reCAPTCHA (for which the Google Privacy Policy and Terms of Service apply).