lxml – Page 4 – Tarik Billa

Python: Using xpath locally / on a specific element

April 20, 2023 by Tarik

Your xpath starts with a slash (/) and is therefore absolute. Add a dot (.) in front to make it relative to the current element i.e. links = table.xpath(“.//a[contains(@href, ‘http://www.example.com/filter/’)]”)

lxml etree xmlparser remove unwanted namespace

April 12, 2023 by Tarik

import io import lxml.etree as ET content=””‘\ <Envelope xmlns=”http://www.example.com/zzz/yyy”> <Header> <Version>1</Version> </Header> <Body> some stuff </Body> </Envelope> ”’ dom = ET.parse(io.BytesIO(content)) You can find namespace-aware nodes using the xpath method: body=dom.xpath(‘//ns:Body’,namespaces={‘ns’:’http://www.example.com/zzz/yyy’}) print(body) # [<Element {http://www.example.com/zzz/yyy}Body at 90b2d4c>] If you really want to remove namespaces, you could use an XSL transformation: # http://wiki.tei-c.org/index.php/Remove-Namespaces.xsl xslt=””‘<xsl:stylesheet version=”1.0″ xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”> … Read more

What are the differences between lxml and ElementTree?

March 30, 2023 by Tarik

ElementTree comes built-in with the Python standard library which includes other data modules types such as json and csv. This means the module ships with each installation of Python. For most normal XML operations including building document trees and simple searching and parsing of element attributes and node values, even namespaces, ElementTree is a reliable … Read more

Parsing HTML in python – lxml or BeautifulSoup? Which of these is better for what kinds of purposes?

March 24, 2023 by Tarik

Pyquery provides the jQuery selector interface to Python (using lxml under the hood). http://pypi.python.org/pypi/pyquery It’s really awesome, I don’t use anything else anymore.

selecting attribute values from lxml

March 21, 2023 by Tarik

find and findall only implement a subset of XPath. Their presence is meant to provide compatibility with other ElementTree implementations (like ElementTree and cElementTree). The xpath method, in contrast, provides full access to XPath 1.0: print customer.xpath(‘./@NAME’)[0] However, you could instead use get: print customer.get(‘NAME’) or attrib: print customer.attrib[‘NAME’]

finding elements by attribute with lxml

March 20, 2023 by Tarik

You can use xpath, e.g. root.xpath(“//article[@type=”news”]”) This xpath expression will return a list of all <article/> elements with “type” attributes with value “news”. You can then iterate over it to do what you want, or pass it wherever. To get just the text content, you can extend the xpath like so: root = etree.fromstring(“”” <root> … Read more

How to find recursively for a tag of XML using LXML?

March 19, 2023 by Tarik

You can use XPath to search recursively: >>> from lxml import etree >>> q = etree.fromstring(‘<xml><hello>a</hello><x><hello>b</hello></x></xml>’) >>> q.findall(‘hello’) # Tag name, first level only. [<Element hello at 414a7c8>] >>> q.findall(‘.//hello’) # XPath, recursive. [<Element hello at 414a7c8>, <Element hello at 414a818>]

Write xml file using lxml library in Python

March 5, 2023 by Tarik

You can get a string from the element and then write that from lxml tutorial str = etree.tostring(root, pretty_print=True) Look at the tostring documentation to set the encoding – this was written in Python 2, Python 3 gives a binary string back which can be written directly to file but is probably not what you … Read more

lxml installation error ubuntu 14.04 (internal compiler error)

February 22, 2023 by Tarik

Possible solution (if you have no ability to increase memory on that machine) is to add swap file. sudo dd if=/dev/zero of=/swapfile bs=1024 count=524288 sudo chmod 600 /swapfile sudo mkswap /swapfile sudo swapon /swapfile from https://github.com/pydata/pandas/issues/1880#issuecomment-9920484 This worked for me on smallest digital ocean machine

how to remove an element in lxml

January 18, 2023 by Tarik

Use the remove method of an xmlElement : tree=et.fromstring(xml) for bad in tree.xpath(“//fruit[@state=\’rotten\’]”): bad.getparent().remove(bad) # here I grab the parent of the element to call the remove directly on it print et.tostring(tree, pretty_print=True, xml_declaration=True) If I had to compare with the @Acorn version, mine will work even if the elements to remove are not directly … Read more