Remove namespace and prefix from xml in python using lxml

We can get the desired output document in two steps:

  1. Remove namespace URIs from element names
  2. Remove unused namespace declarations from the XML tree

Example code

from lxml import etree

input_xml = """
<package xmlns="http://apple.com/itunes/importer">
  <provider>some data</provider>
  <language>en-GB</language>
  <!-- some comment -->
  <?xml-some-processing-instruction ?>
</package>
"""
root = etree.fromstring(input_xml)

# Iterate through all XML elements
for elem in root.getiterator():
    # Skip comments and processing instructions,
    # because they do not have names
    if not (
        isinstance(elem, etree._Comment)
        or isinstance(elem, etree._ProcessingInstruction)
    ):
        # Remove a namespace URI in the element's name
        elem.tag = etree.QName(elem).localname

# Remove unused namespace declarations
etree.cleanup_namespaces(root)

print(etree.tostring(root).decode())

Output XML

<package>
  <provider>some data</provider>
  <language>en-GB</language>
  <!-- some comment -->
  <?xml-some-processing-instruction ?>
</package>

Details explaining the code

As described in the documentation, we use lxml.etree.QName.localname to get local names of elements, that is names without namespace URIs. Then we replace the fully qualified names of the elements by their local names.

Some XML elements, such as comments and processing instructions do not have names. So, we have to skip these elements while replacing element names, otherwise a ValueError will be raised.

Finally, we use lxml.etree.cleanup_namespaces() to remove unused namespace declarations from the XML tree.

Note on namespaced XML attributes

If the XML input contains attributes with explicitly specified namespace prefixes, the example code will not remove those prefixes. To accomplish the deletion of namespace prefixes in attributes, add the following for-loop after the line elem.tag = etree.QName(elem).localname, as suggested here

        for attr_name in elem.attrib:
            local_attr_name = etree.QName(attr_name).localname
            if attr_name != local_attr_name:
                attr_value = elem.attrib[attr_name]
                del elem.attrib[attr_name]
                elem.attrib[local_attr_name] = attr_value

To learn more about namespaced XML attributes see this answer.

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)