gittrends-app / github-proxy-server Goto Github PK
View Code? Open in Web Editor NEWA tool for massive data collection from GitHub APIs (Rest and GraphQL)
License: MIT License
A tool for massive data collection from GitHub APIs (Rest and GraphQL)
License: MIT License
I am assuming that PyGithub integration is intended given that you provided a sample code for it.
The sample code works because it only calls the MainClass.get_repo method and reads the Repository's non-paginated fields.
Even though the proxy tries to handle pagination, it does so relying on the Reponse's header fields. However, PyGitHub doesn't rely on the header cursors of the GitHub v3 API's response. Instead, it uses the Repository attribute url
to create the PaginatedList. Then, the PaginatedList defines self.__nextUrl
using the provided URL.
PyGithub should behave similarly whether I set the baseUrl to localhost:3000
or not. So, the code below should list the pull requests of the "hsborges/github-proxy-server" repository.
gh = Github(base_url="http://localhost:3000")
r = gh.get_repo("hsborges/github-proxy-server")
for pr in r.get_pulls():
print(pr)
Furthermore, the code below should list the labels of the same repository
for label in r.get_labels():
print(label)
Both code examples raise an AssertionError (see the full stack trace below).
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In [112], line 1
----> 1 r.get_labels()[0]
File ~/.local/lib/python3.10/site-packages/github/PaginatedList.py:48, in PaginatedListBase.__getitem__(self, index)
46 assert isinstance(index, (int, slice))
47 if isinstance(index, int):
---> 48 self.__fetchToIndex(index)
49 return self.__elements[index]
50 else:
File ~/.local/lib/python3.10/site-packages/github/PaginatedList.py:64, in PaginatedListBase.__fetchToIndex(self, index)
62 def __fetchToIndex(self, index):
63 while len(self.__elements) <= index and self._couldGrow():
---> 64 self._grow()
File ~/.local/lib/python3.10/site-packages/github/PaginatedList.py:67, in PaginatedListBase._grow(self)
66 def _grow(self):
---> 67 newElements = self._fetchNextPage()
68 self.__elements += newElements
69 return newElements
File ~/.local/lib/python3.10/site-packages/github/PaginatedList.py:201, in PaginatedList._fetchNextPage(self)
200 def _fetchNextPage(self):
--> 201 headers, data = self.__requester.requestJsonAndCheck(
202 "GET", self.__nextUrl, parameters=self.__nextParams, headers=self.__headers
203 )
204 data = data if data else []
206 self.__nextUrl = None
File ~/.local/lib/python3.10/site-packages/github/Requester.py:354, in Requester.requestJsonAndCheck(self, verb, url, parameters, headers, input)
352 def requestJsonAndCheck(self, verb, url, parameters=None, headers=None, input=None):
353 return self.__check(
--> 354 *self.requestJson(
355 verb, url, parameters, headers, input, self.__customConnection(url)
356 )
357 )
File ~/.local/lib/python3.10/site-packages/github/Requester.py:454, in Requester.requestJson(self, verb, url, parameters, headers, input, cnx)
451 def encode(input):
452 return "application/json", json.dumps(input)
--> 454 return self.__requestEncode(cnx, verb, url, parameters, headers, input, encode)
File ~/.local/lib/python3.10/site-packages/github/Requester.py:519, in Requester.__requestEncode(self, cnx, verb, url, parameters, requestHeaders, input, encode)
516 self.__authenticate(url, requestHeaders, parameters)
517 requestHeaders["User-Agent"] = self.__userAgent
--> 519 url = self.__makeAbsoluteUrl(url)
520 url = self.__addParametersToUrl(url, parameters)
522 encoded_input = None
File ~/.local/lib/python3.10/site-packages/github/Requester.py:591, in Requester.__makeAbsoluteUrl(self, url)
589 else:
590 o = urllib.parse.urlparse(url)
--> 591 assert o.hostname in [
592 self.__hostname,
593 "uploads.github.com",
594 "status.github.com",
595 "github.com",
596 ], o.hostname
597 assert o.path.startswith((self.__prefix, "/api/"))
598 assert o.port == self.__port
AssertionError: api.github.com
I believe PyGithub's AssertionError is not the problem, it only highlights that the called hostname (api.github.com
) differs from the expected hostname (localhost:3000
), which is set in the MainClass constructor (aka. Github class).
When PyGithub's Repository object is built, it leverages the API Response data (JSON) to feed its attributes (see where self._useAttributes
is called and where it is declared).
The data fields (archive_url
, assignees_url
, blobs_url
, branches_url
, clone_url
, collaborators_url
, comments_url
, commits_url
, compare_url
, contents_url
, contributors_url
, deployments_url
, downloads_url
, events_url
, forks_url
, git_commits_url
, git_refs_url
, git_tags_url
, git_url
, hooks_url
, html_url
, issue_comment_url
, issue_events_url
, issues_url
, keys_url
, labels_url
, languages_url
, merges_url
, milestones_url
, mirror_url
, notifications_url
, pulls_url
, releases_url
, ssh_url
, stargazers_url
, statuses_url
, subscribers_url
, subscription_url
, svn_url
, tags_url
, teams_url
, trees_url
, url
) should have their https://api.github.com
occurrences replaced with http://localhost:3000
by github-proxy-server too.
The following code should work:
gh = Github(base_url="http://localhost:3000")
r = gh.get_repo("hsborges/github-proxy-server")
proxy_url = r.url.replace("https://api.github.com","http://localhost:3000")
r._useAttributes({"url":proxy_url})
for pr in r.get_pulls():
print(pr)
for label in r.get_labels():
print(label)
However, replacing the cursors' hostname is github-proxy-server's responsibility.
Currently, the tool needs the user to inform the desired endpoint.
It would be nice to have a solution that automatically detect it.
It would be nice to have an option to ignore access tokens on the requests.
This may facilitate the process for users that already have some scripts and don't want to make big changes on them.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.