Why is Python 3 http.client so much faster than python-requests?

Question

Based on profiling both, the main difference appears to be that the requests version is doing a DNS lookup for every request, while the http.client version is doing so once.

# http.client
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1974    0.541    0.000    0.541    0.000 {method 'recv_into' of '_socket.socket' objects}
     1000    0.020    0.000    0.045    0.000 feedparser.py:470(_parse_headers)
    13000    0.015    0.000    0.563    0.000 {method 'readline' of '_io.BufferedReader' objects}
...

# requests
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1481    0.827    0.001    0.827    0.001 {method 'recv_into' of '_socket.socket' objects}
     1000    0.377    0.000    0.382    0.000 {built-in method _socket.gethostbyname}
     1000    0.123    0.000    0.123    0.000 {built-in method _scproxy._get_proxy_settings}
     1000    0.111    0.000    0.111    0.000 {built-in method _scproxy._get_proxies}
    92000    0.068    0.000    0.284    0.000 _collections_abc.py:675(__iter__)
...

You’re providing the hostname to http.client.HTTPConnection() once, so it makes sense it would call gethostbyname once. requests.Session probably could cache hostname lookups, but it apparently does not.

EDIT: After some further research, it’s not just a simple matter of caching. There’s a function for determining whether to bypass proxies which ends up invoking gethostbyname regardless of the actual request itself.

Leave a Comment Cancel reply