JSoup UserAgent, how to set it right?

You might try setting the referrer header as well: doc = Jsoup.connect(“https://www.facebook.com/”) .userAgent(“Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6”) .referrer(“http://www.google.com”) .get();

jsoup – strip all formatting and link tags, keep text only

With Jsoup: final String html = “<p> <span> foo </span> <em> bar <a> foobar </a> baz </em> </p>”; Document doc = Jsoup.parse(html); System.out.println(doc.text()); Output: foo bar foobar baz If you want only the text of p-tag, use this instead of doc.text(): doc.select(“p”).text(); … or only body: doc.body().text(); Linebreak: final String html = “<p><strong>Tarthatatlan biztonsági viszonyok</strong></p>” … Read more

jsoup posting and cookie

When you login to the site, it is probably setting an authorised session cookie that needs to be sent on subsequent requests to maintain the session. You can get the cookie like this: Connection.Response res = Jsoup.connect(“http://www.example.com/login.php”) .data(“username”, “myUsername”, “password”, “myPassword”) .method(Method.POST) .execute(); Document doc = res.parse(); String sessionId = res.cookie(“SESSIONID”); // you will need … Read more

How do I preserve line breaks when using jsoup to convert html to plain text?

The real solution that preserves linebreaks should be like this: public static String br2nl(String html) { if(html==null) return html; Document document = Jsoup.parse(html); document.outputSettings(new Document.OutputSettings().prettyPrint(false));//makes html() preserve linebreaks and spacing document.select(“br”).append(“\\n”); document.select(“p”).prepend(“\\n\\n”); String s = document.html().replaceAll(“\\\\n”, “\n”); return Jsoup.clean(s, “”, Whitelist.none(), new Document.OutputSettings().prettyPrint(false)); } It satisfies the following requirements: if the original html contains newline(\n), … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)