How do I validate xml against a DTD file in Python

Another good option is lxml’s validation which I find quite pleasant to use. A simple example taken from the lxml site: from StringIO import StringIO from lxml import etree dtd = etree.DTD(StringIO(“””<!ELEMENT foo EMPTY>”””)) root = etree.XML(“<foo/>”) print(dtd.validate(root)) # True root = etree.XML(“<foo>bar</foo>”) print(dtd.validate(root)) # False print(dtd.error_log.filter_from_errors()) # <string>:1:0:ERROR:VALID:DTD_NOT_EMPTY: Element foo was declared EMPTY this … Read more

Make DocumentBuilder.parse ignore DTD references

Try setting features on the DocumentBuilderFactory: DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); dbf.setValidating(false); dbf.setNamespaceAware(true); dbf.setFeature(“http://xml.org/sax/features/namespaces”, false); dbf.setFeature(“http://xml.org/sax/features/validation”, false); dbf.setFeature(“http://apache.org/xml/features/nonvalidating/load-dtd-grammar”, false); dbf.setFeature(“http://apache.org/xml/features/nonvalidating/load-external-dtd”, false); DocumentBuilder db = dbf.newDocumentBuilder(); … Ultimately, I think the options are specific to the parser implementation. Here is some documentation for Xerces2 if that helps.

Why are nested anchor tags illegal?

Keep in mind that an anchor isn’t just a link, it’s also something to which one can link. (Though the former use is far more common than the latter.) Quoting W3C (old, but relevant): An anchor is a piece of text which marks the beginning and/or the end of a hypertext link. To that end, … Read more

Installing xmllint

I had the same problem and it took me two hours to make it work. Download iconv, libxml2, libxmlsec, and zlib from [ftp://ftp.zlatkovic.com/libxml/][1] Extract the zip file then copy all the files in the bin folder of each download. Paste the files in a folder (mine = XML) Add the C:\folderName (mine = C:\XML) in … Read more

How to choose between DTD and XSD

It’s probably important to learn DTDs as a separate exercise, just for the knowledge of how they work in case you encounter them somewhere else, and so that you can appreciate some of the things that XSD was trying to solve. However, for your current purposes of describing an XML document, indeed stick to XSDs. … Read more

Where is the HTML5 Document Type Definition?

There is no HTML5 DTD. The HTML5 RC explicitly says this when discussing XHTML serialization, and this clearly applies to HTML serialization as well. DTDs have been regarded by the designers of HTML5 as too limited in expressive power, and HTML5 validators (basically the HTML5 mode of http://validator.nu and its copy at http://validator.w3.org/nu/) use schemas … Read more