scrapy – Page 6 – Tarik Billa

How to pass a user defined argument in scrapy spider

December 17, 2022 by Tarik

Spider arguments are passed in the crawl command using the -a option. For example: scrapy crawl myspider -a category=electronics -a domain=system Spiders can access arguments as attributes: class MySpider(scrapy.Spider): name=”myspider” def __init__(self, category=”, **kwargs): self.start_urls = [f’http://www.example.com/{category}’] # py36 super().__init__(**kwargs) # python3 def parse(self, response) self.log(self.domain) # system Taken from the Scrapy doc: http://doc.scrapy.org/en/latest/topics/spiders.html#spider-arguments Update … Read more

Difference between BeautifulSoup and Scrapy crawler?

November 17, 2022 by Tarik

Scrapy is a Web-spider or web scraper framework, You give Scrapy a root URL to start crawling, then you can specify constraints on how many (number of) URLs you want to crawl and fetch,etc. It is a complete framework for web-scraping or crawling. While BeautifulSoup is a parsing library which also does a pretty good … Read more

Can scrapy be used to scrape dynamic content from websites that are using AJAX?

November 14, 2022 by Tarik

Here is a simple example of scrapy with an AJAX request. Let see the site rubin-kazan.ru. All messages are loaded with an AJAX request. My goal is to fetch these messages with all their attributes (author, date, …): When I analyze the source code of the page I can’t see all these messages because the … Read more

“OSError: [Errno 1] Operation not permitted” when installing Scrapy in OSX 10.11 (El Capitan) (System Integrity Protection)

October 15, 2022 by Tarik

pip install –ignore-installed six Would do the trick. Source: github.com/pypa/pip/issues/3165

Cannot install Lxml on Mac OS X 10.9

October 8, 2022 by Tarik

You should install or upgrade the commandline tool for Xcode. Try this in a terminal: xcode-select –install

Scraping: SSL: CERTIFICATE_VERIFY_FAILED error for http://en.wikipedia.org

October 8, 2022 by Tarik

Once upon a time I stumbled with this issue. If you’re using macOS go to Macintosh HD > Applications > Python3.6 folder (or whatever version of python you’re using) > double click on “Install Certificates.command” file. 😀

Headless Browser and scraping – solutions [closed]

September 23, 2022 by Tarik

If Ruby is your thing, you may also try: https://github.com/chriskite/anemone (dev stopped) https://github.com/sparklemotion/mechanize https://github.com/postmodern/spidr https://github.com/stewartmckee/cobweb http://watirwebdriver.com/ (Selenium) also, Nokogiri gem can be used for scraping: http://nokogiri.org/ there is a dedicated book about how to utilise nokogiri for scraping by packt publishing