Is there a way to increase the API Rate limit or to bypass it altogether for GitHub?

This is a relative solution, because the limit is still 5000 API calls per hour,
or ~80 calls per minute, which is really not that much.

I am writing a tool to compare over 350 repositories in an organization and
to find their correlations.
Ok, the tool uses python for git/github access, but I think
that is not the relevant point, here.

After some initial success, I found out that the capabilities of the GitHub API
are too limited in # of calls and also in bandwidth, if you really want to ask
the repos a lot of deep questions.

Therefore, I switched the concept, using a different approach:

Instead of doing everything with the GitHub API, I wrote a GitHub Mirror script
that is able to mirror all of those repos in less than 15 minutes using my
parallel python script via pygit2.

Then, I wrote everything possible using the local repositories and pygit2.
This solution became faster by a factor of 100 or more, because there was neither an API nor a bandwidth bottle neck.

Of course, this did cost extra effort, because the pygit2 API is quite a bit
different from github3.py that I preferred for the GitHub solution part.

And that is actually my conclusion/advice:
The most efficient way to work with lots of Git data is:

  • clone all repos you are interested in, locally

  • write everything possible using pygit2, locally

  • write other things, like public/private info, pull requests, access to
    wiki pages, issues etc. using the github3.py API or what you prefer.

This way, you can maximize your throughput, while your limitation is now the
quality of your program. (also non-trivial)

Leave a Comment