beautifulsoup – Tarik Billa

Error “AttributeError ‘collections’ has no attribute ‘Callable’ ” using Beautiful Soup

April 12, 2024 by Tarik

collections.Callable has been moved to collections.abc.Callable in python 3.10+. A hacky solution is to add the reference back to collections before importing the problem library. import collections collections.Callable = collections.abc.Callable from bs4 import BeautifulSoup # for example

How do you get all the rows from a particular table using BeautifulSoup?

April 9, 2024 by Tarik

This should be pretty straight forward if you have a chunk of HTML to parse with BeautifulSoup. The general idea is to navigate to your table using the findChildren method, then you can get the text value inside the cell with the string property. >>> from BeautifulSoup import BeautifulSoup >>> >>> html = “”” … … Read more

Python and BeautifulSoup encoding issues [duplicate]

April 9, 2024 by Tarik

In your case this page has wrong utf-8 data which confuses BeautifulSoup and makes it think that your page uses windows-1252, you can do this trick: soup = BeautifulSoup.BeautifulSoup(content.decode(‘utf-8′,’ignore’)) by doing this you will discard any wrong symbols from the page source and BeautifulSoup will guess the encoding correctly. You can replace ‘ignore’ by ‘replace’ … Read more

Find a link that contains a specific word using BeautifulSoup

April 8, 2024 by Tarik

You can do it with a simple “contains” CSS selector: soup.select(“a[href*=location]”) Or, if only one link needs to be matched, use select_one(): soup.select_one(“a[href*=location]”) And, of course, there are many other ways – for instance, you can use find_all() providing the href argument which can have a regular expression value or a function: import re soup.find_all(“a”, … Read more

What is the difference beautifulsoup and bs4

April 5, 2024 by Tarik

When I go to the beautifulsoup 4.0 documentation, the first page has this information: (The BeautifulSoup package is probably not what you want. That’s the previous major release, Beautiful Soup 3. Lots of software uses BS3, so it’s still available, but if you’re writing new code you should install beautifulsoup4.)

BeautifulSoup: what’s the difference between ‘lxml’ and ‘html.parser’ and ‘html5lib’ parsers?

April 3, 2024 by Tarik

From the docs‘s summarized table of advantages and disadvantages: html.parser – BeautifulSoup(markup, “html.parser”) Advantages: Batteries included, Decent speed, Lenient (as of Python 2.7.3 and 3.2.) Disadvantages: Not very lenient (before Python 2.7.3 or 3.2.2) lxml – BeautifulSoup(markup, “lxml”) Advantages: Very fast, Lenient Disadvantages: External C dependency html5lib – BeautifulSoup(markup, “html5lib”) Advantages: Extremely lenient, Parses pages … Read more

Find HTML attribute values using BeautifulSoup

April 2, 2024 by Tarik

soup.find(“div”, {“class”:”real number”})[‘data-value’] Here you are searching for a div element, but the span has the “real number” class in your example HTML data, try instead: soup.find(“span”, {“class”: “real number”, “data-value”: True})[‘data-value’] Here we are also checking for presence of data-value attribute. To find elements having “real number” or “fake number” classes, you can make … Read more

How to find all divs whose class starts with a string in BeautifulSoup?

January 3, 2024 by Tarik

Well, these are id attributes you are showing: <div id=”span3 span49″> <div id=”span3 span39″> In this case, you can use: soup.find_all(“div”, id=lambda value: value and value.startswith(“span3”)) Or: soup.find_all(“div”, id=re.compile(“^span3”)) If this was just a typo, and you actually have class attributes start with span3, and your really need to check the class to start with … Read more

import-im6.q16: not authorized error ‘os’ @ error/constitue.c/WriteImage/1037 for a Python web scraper

December 30, 2023 by Tarik

I am pretty sure you are missing a shebang at the beginning of your file, for example #!/usr/bin/env python3 #!/usr/bin/env python2

Beautifulsoup multiple class selector

December 23, 2023 by Tarik

Use css selectors instead: soup.select(‘div.A.B’)