Jsoup Cookies for HTTPS scraping

I know I’m kinda late by 10 months here. But a good option using Jsoup is to use this easy peasy piece of code: //This will get you the response. Response res = Jsoup .connect(“url”) .data(“loginField”, “[email protected]”, “passField”, “pass1234”) .method(Method.POST) .execute(); //This will get you cookies Map<String, String> cookies = res.cookies(); //And this is the … Read more

Does jsoup support xpath?

JSoup doesn’t support XPath yet, but you may try XSoup – “Jsoup with XPath”. Here’s an example quoted from the projects Github site (link): @Test public void testSelect() { String html = “<html><div><a href=”https://github.com”>github.com</a></div>” + “<table><tr><td>a</td><td>b</td></tr></table></html>”; Document document = Jsoup.parse(html); String result = Xsoup.compile(“//a/@href”).evaluate(document).get(); Assert.assertEquals(“https://github.com”, result); List<String> list = Xsoup.compile(“//tr/td/text()”).evaluate(document).list(); Assert.assertEquals(“a”, list.get(0)); Assert.assertEquals(“b”, list.get(1)); } … Read more

Jsoup select div having multiple classes

Works for me with latest Jsoup (1.5.2). String html = “<div class=\”content-text right-align bold-font\”>foo</div>”; Document document = Jsoup.parse(html); Elements elements = document.select(“div.content-text.right-align.bold-font”); System.out.println(elements.text()); // foo So either you’re possibly using an outdated version of Jsoup which exposes a bug related to this, or the actual HTML doesn’t contain a <div> like that.

Jsoup select and iterate all elements

You can select all elements of the document using * selector and then get text of each individually using Element#ownText(). Elements elements = document.body().select(“*”); for (Element element : elements) { System.out.println(element.ownText()); }

(how) can I download an image using JSoup?

I didn’t even finish writing the question before I found the answer via JSoup and a little experimentation. //Open a URL Stream Response resultImageResponse = Jsoup.connect(imageLocation).cookies(cookies) .ignoreContentType(true).execute(); // output here FileOutputStream out = (new FileOutputStream(new java.io.File(outputFolder + name))); out.write(resultImageResponse.bodyAsBytes()); // resultImageResponse.body() is where the image’s contents are. out.close();

How to parse XML with jsoup

It seems the latest version of Jsoup (1.6.2 – released March 28, 2012) includes some basic support for XML. String html = “<?xml version=\”1.0\” encoding=\”UTF-8\”><tests><test><id>xxx</id><status>xxx</status></test><test><id>xxx</id><status>xxx</status></test></tests></xml>”; Document doc = Jsoup.parse(html, “”, Parser.xmlParser()); for (Element e : doc.select(“test”)) { System.out.println(e); } Give that a shot..

Jsoup: how to get an image’s absolute url?

Once you have the image element, e.g.: Element image = document.select(“img”).first(); String url = image.absUrl(“src”); // url = http://www.example.com/images/chicken.jpg Alternatively: String url = image.attr(“abs:src”); Jsoup has a builtin absUrl() method on all nodes to resolve an attribute to an absolute URL, using the base URL of the node (which could be different from the URL … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)