Based on my rudimentary understanding of this situation, any XML data that does not have a document type declaration (such as OPML) must declare entity references for any entity other than &, &apos, >, <, and ".
To see an example, this OPML outline crashes OPML Link Publisher because it refers to é and Ë and does not declare them. The problem can be fixed by editing the outline to add this declaration above the root element:
<!DOCTYPE opml [
<!ENTITY eacute "é">
<!ENTITY Euml "Ë">
]>
As an alternative, it can be fixed for a much larger number of entities by adding this declaration:
<!DOCTYPE opml [
<!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">
%HTMLlat1;
]>
As I consider solutions, I'm looking for XML gurus and OPML software developers who have dealt with this problem. My initial perspective is that we should be able to count on receiving well-formed XML from OPML authoring programs.
If I'm not mistaken, you're looking for solutions to "validity" not "well formedness". A well formed document is one that has all its tags matched up and starts with the right header. Valid means that an XML document is well-formed, has a DTD and complies with it.
OPML will never be able to be Valid because it's a non-spec. Part of the description of OPML is that the number of attributes aren't finite. Thus anyone can add in another attribute. It's almost not XML.
-Russ
That was my initial thought too, but according to the XML spec, undeclared entity references in a document without a DTD are a well-formedness problem, not a validity problem. I can live with the lack of validitation.
Rogers,
This is (was) a well-debated weakness of JDOM/XOM and their tight constraints based on the notion that it should be impossible to generate ill-formed XML.
What you may want to do look at is the email I have sent you with a zip containing some code I wrote about a year ago for an OPML browser that overcame this exact limitation of JDOM.
Regards from a fellow North East Floridian,
Les
All comments are moderated before publication. These HTML tags are permitted: <p>, <b>, <i>, <a>, and <blockquote>. This site is protected by reCAPTCHA (for which the Google Privacy Policy and Terms of Service apply).