How do I pretty-print HTML with Nokogiri?

The answer by @mislav is somewhat wrong. Nokogiri does support pretty-printing if you: Parse the document as XML Instruct Nokogiri to ignore whitespace-only nodes (“blanks”) during parsing Use to_xhtml or to_xml to specify pretty-printing parameters In action: html=”<section> <h1>Main Section 1</h1><p>Intro</p> <section> <h2>Subhead 1.1</h2><p>Meat</p><p>MOAR MEAT</p> </section><section> <h2>Subhead 1.2</h2><p>Meat</p> </section></section>” require ‘nokogiri’ doc = Nokogiri::XML(html,&:noblanks) puts … Read more

OS X Lion, Attempting Nokogiri install – libxml2 is missing

In Mavericks, installing the libraries with Homebrew and setting NOKOGIRI_USE_SYSTEM_LIBRARIES=1 before installing the gem did the trick for me. Summarising: If previously installed, uninstall the gem: $ gem uninstall nokogiri Use Homebrew to install libxml2, libxslt and libiconv: $ brew install libxml2 libxslt libiconv Install the gem specifying the paths to the libraries to be … Read more

How do I parse an HTML table with Nokogiri?

#!/usr/bin/ruby1.8 require ‘nokogiri’ require ‘pp’ html = <<-EOS (The HTML from the question goes here) EOS doc = Nokogiri::HTML(html) rows = doc.xpath(‘//table/tbody[@id=”threadbits_forum_251″]/tr’) details = rows.collect do |row| detail = {} [ [:title, ‘td[3]/div[1]/a/text()’], [:name, ‘td[3]/div[2]/span/a/text()’], [:date, ‘td[4]/text()’], [:time, ‘td[4]/span/text()’], [:number, ‘td[5]/a/text()’], [:views, ‘td[6]/text()’], ].each do |name, xpath| detail[name] = row.at_xpath(xpath).to_s.strip end detail end pp details … Read more

How to install Nokogiri Ruby gem with mkmf.log saying libiconv not found?

To diagnose and solve, here’s what worked for me. To find out what failed, go to your ruby gems directory. For example: $ cd <MY RUBY DIRECTORY>/lib/ruby/gems/2.0.0/gems If you don’t know your gem directory, try this: $ echo $GEM_HOME /opt/gems/2.0.0 $ cd /opt/gems/2.0.0/gems What version of nokogiri am I installing? $ ls -ladg nokogiri-* nokogiri-1.5.5 … Read more

How to prevent Nokogiri from adding tags?

The problem occurs because you’re using the wrong method in Nokogiri to parse your content. require ‘nokogiri’ doc = Nokogiri::HTML(‘<p>foobar</p>’) puts doc.to_html # >> <!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.0 Transitional//EN” “http://www.w3.org/TR/REC-html40/loose.dtd”> # >> <html><body><p>foobar</p></body></html> Rather than using HTML which results in a complete document, use HTML.fragment, which tells Nokogiri you only want the fragment … Read more

Error installing Nokogiri 1.5.0 with rails 3.1.0 and ubuntu

You need to have all the necessary libraries installed on your machine. When you installed RVM , it should have listed this for you. On the current version of rvm, you can run rvm requirements to see the exact list. Right now, that list is: sudo apt-get install build-essential openssl libreadline6 libreadline6-dev curl git-core zlib1g … Read more

Getting attribute’s value in Nokogiri to extract link URLs

html = <<HTML <div id=”block”> <a href=”http://google.com”>link</a> </div> HTML doc = Nokogiri::HTML(html) doc.xpath(‘//div/a/@href’) #=> [#<Nokogiri::XML::Attr:0x80887798 name=”href” value=”http://google.com”>] Or if you wanna be more specific about the div: >> doc.xpath(‘//div[@id=”block”]/a/@href’) => [#<Nokogiri::XML::Attr:0x80887798 name=”href” value=”http://google.com”>] >> doc.xpath(‘//div[@id=”block”]/a/@href’).first.value => “http://google.com”

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)