Giter VIP home page Giter VIP logo

Comments (4)

karlcow avatar karlcow commented on July 23, 2024 2

The API rate limit is 5000 HTTP requests per hour (not minutes as said above).

Let's say we want to backup issues.
A repo with more than 5000 issues, it starts to become a problem.

The theoritical limit is

  • 1.38888 requests per second.

So we could artificially set a timer with one request per second and we would be safe.
A backup for issues in a repo of 40,000+ issues would "only " take 11h6m40s.

def retrieve_data(args, template, query_args=None, single_request=False):
return list(retrieve_data_gen(args, template, query_args, single_request))

def retrieve_data_gen(args, template, query_args=None, single_request=False):
auth = get_auth(args)
query_args = get_query_args(query_args)
per_page = 100
page = 0
while True:
page = page + 1
request = _construct_request(per_page, page, query_args, template, auth) # noqa
r, errors = _get_response(request, auth, template)
status_code = int(r.getcode())
retries = 0
while retries < 3 and status_code == 502:
print('API request returned HTTP 502: Bad Gateway. Retrying in 5 seconds')
retries += 1
time.sleep(5)
request = _construct_request(per_page, page, query_args, template, auth) # noqa
r, errors = _get_response(request, auth, template)
status_code = int(r.getcode())
if status_code != 200:
template = 'API request returned HTTP {0}: {1}'
errors.append(template.format(status_code, r.reason))
log_error(errors)
response = json.loads(r.read().decode('utf-8'))
if len(errors) == 0:
if type(response) == list:
for resp in response:
yield resp
if len(response) < per_page:
break
elif type(response) == dict and single_request:
yield response
if len(errors) > 0:
log_error(errors)
if single_request:
break

There is also this piece of code which use rate limiting but only in case there's already an error.

def _request_http_error(exc, auth, errors):
# HTTPError behaves like a Response so we can
# check the status code and headers to see exactly
# what failed.
should_continue = False
headers = exc.headers
limit_remaining = int(headers.get('x-ratelimit-remaining', 0))
if exc.code == 403 and limit_remaining < 1:
# The X-RateLimit-Reset header includes a
# timestamp telling us when the limit will reset
# so we can calculate how long to wait rather
# than inefficiently polling:
gm_now = calendar.timegm(time.gmtime())
reset = int(headers.get('x-ratelimit-reset', 0)) or gm_now
# We'll never sleep for less than 10 seconds:
delta = max(10, reset - gm_now)
limit = headers.get('x-ratelimit-limit')
print('Exceeded rate limit of {} requests; waiting {} seconds to reset'.format(limit, delta), # noqa
file=sys.stderr)
if auth is None:
print('Hint: Authenticate to raise your GitHub rate limit',
file=sys.stderr)
time.sleep(delta)
should_continue = True
return errors, should_continue

The strategy could be slightly different.

  • Counting the HTTP requests: ๐‘›
  • Marking the time of the first request: ๐‘กโ‚€ (seconds)
  • time of the current request: ๐‘ก๐‘ (seconds)
  • rate, an optional parameter: rate โ‰ค 1.38
if ๐‘› > (๐‘ก๐‘ - ๐‘กโ‚€) ร— rate : 
   wait 1 sec before next request

from python-github-backup.

garymoon avatar garymoon commented on July 23, 2024 1

I am successfully using @eht16's throttling (๐Ÿ’™) to keep below the rate limit when backing up very large orgs. I'm using --throttle-limit 5000 --throttle-pause 0.6 but YMMV. IMO @eth16's work should close this issue ๐Ÿ‘

from python-github-backup.

josegonzalez avatar josegonzalez commented on July 23, 2024

There is not. Pull requests welcome.

from python-github-backup.

eht16 avatar eht16 commented on July 23, 2024

I've created a very simple throttling approach in #149.
This is not very clever and it simply pause API requests a fixed amount of seconds but it helps to stay within the rate limits.
My use case is: the GitHub API user used for the backup is also used elsewhere. It doesn't matter how long the backup takes as long as there are a few API requests left for the other uses.

from python-github-backup.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.