screen-scraping – Page 4

What’s the best way of scraping data from a website? [closed]

December 25, 2022 by Tarik

You will definitely want to start with a good web scraping framework. Later on you may decide that they are too limiting and you can put together your own stack of libraries but without a lot of scraping experience your design will be much worse than pjscrape or scrapy. Note: I use the terms crawling … Read more

Can scrapy be used to scrape dynamic content from websites that are using AJAX?

November 14, 2022 by Tarik

Here is a simple example of scrapy with an AJAX request. Let see the site rubin-kazan.ru. All messages are loaded with an AJAX request. My goal is to fetch these messages with all their attributes (author, date, …): When I analyze the source code of the page I can’t see all these messages because the … Read more

Web scraping with Python [closed]

October 19, 2022 by Tarik

Use urllib2 in combination with the brilliant BeautifulSoup library: import urllib2 from BeautifulSoup import BeautifulSoup # or if you’re using BeautifulSoup4: # from bs4 import BeautifulSoup soup = BeautifulSoup(urllib2.urlopen(‘http://example.com’).read()) for row in soup(‘table’, {‘class’: ‘spad’})[0].tbody(‘tr’): tds = row(‘td’) print tds[0].string, tds[1].string # will print date and sunrise

How do I prevent site scraping? [closed]

September 27, 2022 by Tarik

Note: Since the complete version of this answer exceeds Stack Overflow’s length limit, you’ll need to head to GitHub to read the extended version, with more tips and details. In order to hinder scraping (also known as Webscraping, Screenscraping, Web data mining, Web harvesting, or Web data extraction), it helps to know how these scrapers … Read more