How can I view a text representation of an lxml element?

From http://lxml.de/tutorial.html#serialisation >>> root = etree.XML(‘<root><a><b/></a></root>’) >>> etree.tostring(root) b'<root><a><b/></a></root>’ >>> print(etree.tostring(root, xml_declaration=True)) <?xml version=’1.0′ encoding=’ASCII’?> <root><a><b/></a></root> >>> print(etree.tostring(root, encoding=’iso-8859-1′)) <?xml version=’1.0′ encoding=’iso-8859-1′?> <root><a><b/></a></root> >>> print(etree.tostring(root, pretty_print=True)) <root> <a> <b/> </a> </root>

BeautifulSoup: what’s the difference between ‘lxml’ and ‘html.parser’ and ‘html5lib’ parsers?

From the docs‘s summarized table of advantages and disadvantages: html.parser – BeautifulSoup(markup, “html.parser”) Advantages: Batteries included, Decent speed, Lenient (as of Python 2.7.3 and 3.2.) Disadvantages: Not very lenient (before Python 2.7.3 or 3.2.2) lxml – BeautifulSoup(markup, “lxml”) Advantages: Very fast, Lenient Disadvantages: External C dependency html5lib – BeautifulSoup(markup, “html5lib”) Advantages: Extremely lenient, Parses pages … Read more

out of memory issue in installing packages on Ubuntu server

Extend your RAM by adding a swap file: http://www.cyberciti.biz/faq/linux-add-a-swap-file-howto/ a swap file is a file stored on the computer hard drive that is used as a temporary location to store information that is not currently being used by the computer RAM. By using a swap file a computer has the ability to use more memory … Read more

How do I use a default namespace in an lxml xpath query?

Something like this should work: import lxml.etree as et ns = {“atom”: “http://www.w3.org/2005/Atom”} tree = et.fromstring(xml) for node in tree.xpath(‘//atom:entry’, namespaces=ns): print node See also http://lxml.de/xpathxslt.html#namespaces-and-prefixes. Alternative: for node in tree.xpath(“//*[local-name() = ‘entry’]”): print node

Setup.py: install lxml with Python2.6 on CentOS

I had the same issue, I managed to install it after installing the package libxslt-devel and python-devel which seems to be your problem: yum install libxslt-devel python-devel python setup.py install Installed /usr/lib/python2.6/site-packages/lxml-2.2.8-py2.6-linux-i686.egg Processing dependencies for lxml==2.2.8 Finished processing dependencies for lxml==2.2.8 However since I also installed other packages in the process, you might want to … Read more

Best way for a beginner to learn screen scraping by Python [closed]

I agree that the Scrapy docs give off that impression. But, I believe, as I found for myself, that if you are patient with Scrapy, and go through the tutorials first, and then bury yourself into the rest of the documentation, you will not only start to understand the different parts to Scrapy better, but … Read more