Web Scraping With Haskell

http://hackage.haskell.org/package/shpider

Shpider is a web automation library
for Haskell. It allows you to quickly
write crawlers, and for simple cases (
like following links ) even without
reading the page source.

It has useful features such as turning
relative links from a page into
absolute links, options to authorize
transactions only on a given domain,
and the option to only download html
documents.

It also provides a nice syntax for
filling out forms.

An example:

 runShpider $ do
      download "http://apage.com"
      theForm : _ <- getFormsByAction "http://anotherpage.com"
      sendForm $ fillOutForm theForm $ pairs $ do
            "occupation" =: "unemployed Haskell programmer"
            "location" =: "mother's house"

(Edit in 2018 — shpider is deprecated, these days https://hackage.haskell.org/package/scalpel might be a good replacement)

Leave a Comment

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)