Python and BeautifulSoup encoding issues [duplicate]

In your case this page has wrong utf-8 data which confuses BeautifulSoup and makes it think that your page uses windows-1252, you can do this trick: soup = BeautifulSoup.BeautifulSoup(content.decode(‘utf-8′,’ignore’)) by doing this you will discard any wrong symbols from the page source and BeautifulSoup will guess the encoding correctly. You can replace ‘ignore’ by ‘replace’ … Read more

Find a link that contains a specific word using BeautifulSoup

You can do it with a simple “contains” CSS selector: soup.select(“a[href*=location]”) Or, if only one link needs to be matched, use select_one(): soup.select_one(“a[href*=location]”) And, of course, there are many other ways – for instance, you can use find_all() providing the href argument which can have a regular expression value or a function: import re soup.find_all(“a”, … Read more

BeautifulSoup: what’s the difference between ‘lxml’ and ‘html.parser’ and ‘html5lib’ parsers?

From the docs‘s summarized table of advantages and disadvantages: html.parser – BeautifulSoup(markup, “html.parser”) Advantages: Batteries included, Decent speed, Lenient (as of Python 2.7.3 and 3.2.) Disadvantages: Not very lenient (before Python 2.7.3 or 3.2.2) lxml – BeautifulSoup(markup, “lxml”) Advantages: Very fast, Lenient Disadvantages: External C dependency html5lib – BeautifulSoup(markup, “html5lib”) Advantages: Extremely lenient, Parses pages … Read more

Find HTML attribute values using BeautifulSoup

soup.find(“div”, {“class”:”real number”})[‘data-value’] Here you are searching for a div element, but the span has the “real number” class in your example HTML data, try instead: soup.find(“span”, {“class”: “real number”, “data-value”: True})[‘data-value’] Here we are also checking for presence of data-value attribute. To find elements having “real number” or “fake number” classes, you can make … Read more

How to find all divs whose class starts with a string in BeautifulSoup?

Well, these are id attributes you are showing: <div id=”span3 span49″> <div id=”span3 span39″> In this case, you can use: soup.find_all(“div”, id=lambda value: value and value.startswith(“span3”)) Or: soup.find_all(“div”, id=re.compile(“^span3”)) If this was just a typo, and you actually have class attributes start with span3, and your really need to check the class to start with … Read more

Hata!: SQLSTATE[HY000] [1045] Access denied for user 'divattrend_liink'@'localhost' (using password: YES)