How do I avoid HTTP error 403 when web scraping with Python?

This is probably because of mod_security or some similar server security feature which blocks known spider/bot user agents (urllib uses something like python urllib/3.3.0, it’s easily detected). Try setting a known browser user agent with: from urllib.request import Request, urlopen req = Request( url=”http://www.cmegroup.com/trading/products/#sortField=oi&sortAsc=false&venues=3&page=1&cleared=1&group=1″, headers={‘User-Agent’: ‘Mozilla/5.0’} ) webpage = urlopen(req).read() This works for me. By … Read more

Instagram/feed API media URL shows ‘URL signature expired’

You could use the media URL with some extra parameters as a solution to get the desired image instead of using the direct image link. For example https://www.instagram.com/p/Bo7OXJ3hYM8/media/?size=m Notice the addon /media/?size=m Letters could be t, m or l for different picture sizes This should return you the desired image Reference: https://www.instagram.com/developer/embedding/

Spring WebSocket Connecting with SockJS to a different domain

Jax’s answer was correct 🙂 The registerStompEndpoints method gives us the opportunity to set the Allowed Origins. We need to add it before the “withSockJs()” option. @Override public void registerStompEndpoints(StompEndpointRegistry stompEndpointRegistry) { stompEndpointRegistry.addEndpoint(“/BO/socket”).setAllowedOrigins(“*”).withSockJS(); }

How to fix “403 Forbidden” errors when calling APIs using Python requests?

It seems the page rejects GET requests that do not identify a User-Agent. I visited the page with a browser (Chrome) and copied the User-Agent header of the GET request (look in the Network tab of the developer tools): import requests url=”http://worldagnetwork.com/” headers = {‘User-Agent’: ‘Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like … Read more

401 Unauthorized vs 403 Forbidden: Which is the right status code for when the user has not logged in? [duplicate]

The exact satisfying one-time-for-all answer I found is: Short answer: 401 Unauthorized Description: While we know first is authentication (has the user logged-in or not?) and then we will go into authorization (does he have the needed privilege or not?), but here’s the key that makes us mistake: But isn’t “401 Unauthorized” about authorization, not … Read more

Fetch a Wikipedia article with Python

You need to use the urllib2 that superseedes urllib in the python std library in order to change the user agent. Straight from the examples import urllib2 opener = urllib2.build_opener() opener.addheaders = [(‘User-agent’, ‘Mozilla/5.0’)] infile = opener.open(‘http://en.wikipedia.org/w/index.php?title=Albert_Einstein&printable=yes’) page = infile.read()