josegonzalez / python-github-backup Goto Github PK
View Code? Open in Web Editor NEWbackup a github user or organization
License: MIT License
backup a github user or organization
License: MIT License
Is the '--incremental' mode supposed to work? After performing a '--all'-backup, subsequent incremental backups do not work. Reproducibly with this error message. ([xxx] redacted by me)
Backing up user [xxx] to [xxx]
Retrieving repositories
Filtering repositories
Backing up repositories
Traceback (most recent call last):
File "/usr/local/bin/github-backup", line 834, in
main()
File "/usr/local/bin/github-backup", line 829, in main
backup_repositories(args, output_directory, repositories)
File "/usr/local/bin/github-backup", line 498, in backup_repositories
last_update = max(repository['updated_at'] for repository in repositories) # noqa
ValueError: max() arg is an empty sequence
I use Python 2.7.13.
I was having an issue running the code. It's was a silly system config error I ran into on a new dev environment. Since code throws a misleading I'm putting the issue here in case other people happen to have the issue.
https://api.github.com/user timed out
https://api.github.com/user timed out
https://api.github.com/user timed out
https://api.github.com/user timed out
but know you are connected to the network
cd /Applications/Python\ 3.7/
./Install\ Certificates.command
(change 3.7 to whatever version of python 3 you are using)
Python 3.7 does not rely on MacOS' openSSL anymore. It comes with its own openSSL bundled and doesn't have access on MacOS' root certificates.
in _get_response() the detail of the URLError is swallowed up, and leads you to believe you call is timing out
except URLError:
should_continue = _request_url_error(template, retry_timeout)
if not should_continue:
raise
I logged the error
except URLError as e:
log_error(e.reason)
should_continue = _request_url_error(template, retry_timeout)
if not should_continue:
raise
and saw
[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed
I ran the above commands and everything is working
Thanks to the post here that set me straight
https://stackoverflow.com/questions/40684543/how-to-make-python-use-ca-certificates-from-mac-os-truststore
I thought the --watched include watched repositories in backup
option would back up the watched repositories, but it actually only backup a list of watched repositories.
On github you often contribute to repository that are not in your own namespace and then watch them.
It would be useful to be able to backup all these repositories (as a new option for example)
Upon re-running the script on the same output directory, with these options --all --all-starred --private --fork --bare --incremental
, I noticed that for repositories that already existed, git
would return error 128.
I checked the error myself by running the same command in a separate terminal window, and noticed that git
was failing to clone because "the destination path already exists and is not an empty directory".
Is there any way to have the script run git remote update
for already existing repositories? Or, is there any way to force git
into clobbering the directory (though this might be less efficient than pulling just the changes, since it would be cloning everything from scratch)?
I followed all the steps in the doc on creating a osx keychain password item, but when I run the following command: github-backup $ORG --keychain-name ghbackup --keychain-account laghee -o ./data --issues --issue-events --milestones --labels -R $REPO
I keep getting this error:
No password item matching the provided name and account could be found in the osx keychain.
This is my password item:
I'm on MacOS HighSierra (10.13.5).
Is there some other field that needs to be modified that isn't mentioned in the docs?
Since I only need to run one backup at the moment, I directly entered my credentials with the --username
and --token
flags, and that worked fine. But if I ever had to run several or set a script to back up automatically ... this would be a problem.
Since pull requests are also issues in Github, currently pull requests are downloaded twice, as issue and as pull request.
When retrieved as issue, there are some information missing from the API, if I checked correctly. But you can differentiate issues from pull requests in the API response by the key pull_request
according to the docs at https://developer.github.com/v3/issues/.
I wonder if it makes sense to drop all issues from the response which have the pull_request
key set. At least if retrieving pull requests separately is requested by the user.
If accepted, I could make PR.
Ran into a python error while testing backing up my organization's repositories.
# github-backup $ORGANIZATION --output-directory ./ --organization --token $GITHUB_ACCESS_TOKEN --all --private --fork --prefer-ssh --name-regex $REPO_NAME
Backing up user <REDACTED> to /root/github_backups
Retrieving repositories
Filtering repositories
Backing up repositories
Traceback (most recent call last):
File "/usr/local/bin/github-backup", line 885, in <module>
main()
File "/usr/local/bin/github-backup", line 880, in main
backup_repositories(args, output_directory, repositories)
File "/usr/local/bin/github-backup", line 555, in backup_repositories
lfs_clone=arg.lfs_clone)
NameError: global name 'arg' is not defined
It looks like there is a reference to an undefined variable in:
Probably just a typo arg
instead of args
. I tested changing this to args and the backup script worked great!
I am parsing the readme, but only saw this PR: #11. What is the criteria here? Why won't the repos clone? They are definitely there an we use them everyday with active code.
Skipping name ([email protected]:ORG/name.git) since it's not initialized
I know it exists fine:
$ git ls-remote https://github.com/ORG/name.git
Username for 'https://github.com': name
Password for 'https://[email protected]':
<hash> HEAD
<hash>refs/heads/master
<hash>refs/remotes/origin/HEAD
<hash>refs/remotes/origin/master
I took a backup of one of my repository which container source files, issues and wiki.
I used github-backup to backup my repository. I restored the same repository by pushing the backup
repository to origin master. I got my source files, but not issues and wiki.
Can anyone help me with this ?
We are using LFS (large-file support) in our repositories.
From our testing, we see that this is not supported by this script, correct?
If it's not supported, are you having any plans to add support for LFS clones?
Thank you for this tool. It is working great for me to backup repositories, wikis, and issues. Two things I noticed missing that prevent it from being a complete backup are lists of milestones and labels for a repository. I know this information is essentially contained within the issues, but it would be useful to have them in their own separate lists, especially to retain labels/milestones without associated issues.
The GitHub API has simple endpoints for each:
GET /repos/:owner/:repo/milestones?state=all
GET /repos/:owner/:repo/labels
Thanks for your consideration!
I, i'm a owner of an organization : [(https://github.com/CEDIA-models)] and I want to backup issues from a single private repository.
My idea was to log with my account(francoislauger) and acces the organization with directory :
github-backup -u USER francoislauger -O -P -R CEDIA-models/REPO --issues --issue-comments
I get a prompt for a password but after that I get that
Requesting https://api.github.com/user?per_page=100&page=1
API request returned HTTP 403: Forbidden
I'm sure i'm doing something wrong but not sure what.
I've been poking around trying to get this to work and I keep getting the "too few arguments" error. Digging into the source code, now, but a plain ol' example would be nice addition to the readme. Thanks.
After installing github-backup via pip install github-backup or this git clone ...
bash: /usr/bin/github-backup: No such file or directory
Distribution: Ubuntu 16.04
Maybe it;s cos before i installed another tool named github-backup but a little older
Fixed by
cp /usr/local/bin/github-backup /usr/bin/github-backup
Hi! I didn't let it get very far before killing it, but when I ran the command like this:
github-backup -u ethus3h -t [token] --all --repositories -P -F --gists --starred-gists --hooks --milestones --labels --bare --lfs --wikis --starred --all-starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --pull-details --hooks SomeoneElsesUsername
It started downloading all of my (ethus3h's) starred repositories, instead of the specified username's. I tried not including the -u
and -t
options, but it gave 401 Unauthorized.
This seems to prevent downloading the specified user's starred repositories.
It looks like this also affects starred gists.
Is there a way to restore a repo or all repos with issues, PRs etc?
either to github or gitlab or somewhere else
because using select.select is only supported on sockets in windows and not on other streams (stderr for example)
I'd like to back up multiple repositories from my account, but not the whole thing, as I have forked various big repositories I just made some small (or no) contributions to. I found no way to to this out of the box, as it is only possible to select a single repository for backup.
Calling the program multiple times with different repositories and the same backup directory does not work either, as the last_update state is not tracked per repository, but only once in the backup directory root.
Would you agree on fixing this, either by making it possible to give multiple repositories to backup, or by moving the last_update state into the repository directory in the backup dir (or preferably both)? I'd be willing to provide pull requests for at least a part of the proposed changes.
In issue #38 I read that this program doesn't include anything to restore the repository. Are any attempts made in creating such a script (by 3rd parties perhaps) or is it expected to be added to this program in the future?
Avoid the git error when a repo has no wiki.
When I run github-backup
on a multiuser server, other users can see my token when using e.g. htop
. Is there a way to read the token from file so that is not shown as clear-text in the script arguments?
I'm guessing 0.7.0
(assuming you're following http://semver.org/), since there are new features (#28).
Thanks for the quick merges!
github-backup -u mikebz -p **** --all -O tempoautomation
the organization has about 80 repos. I ran the script with --all and -P and it only pulled a few down. Not sure what the issue is, but it doesn't seem to want to traverse the full set of repos.
Is there a trick to get this script to work?
Having a JSON backup of what repositories are starred is fine, so long as the repositories themselves are never removed from Github. Sometimes they disappear so it would be useful to optionally clone all stars.
Is this beyond the scope of the project?
Thanks!
Hello,
python-github-backup invokes git fetch
with the --tags
option. The meaning of the --tags
option changed in Git 1.9.0:
The meanings of the "--tags" option to "git fetch" has changed; the command fetches tags in addition to what is fetched by the same command line without the option.
What you have now should produce expected results on later Git releases, but any of your users with earlier Git releases may unknowingly be relying upon incomplete backups.
On these earlier Git releases, git fetch --tags
will only fetch objects reachable from tag refs, not branch refs. In a repo with no tags, this means that nothing will be fetched. This is particularly dangerous because the backup will look hunky dory at the time the user first experiments with python-github-backup (the initial git clone
will fetch everything) and because no errors will be reported from then on. (To be sure, this is not an erroneous condition from Git's perspective, hence, no errors.)
This is easy enough to fix for future users. Some options I can think of:
README
with a suggested Git version requirement.git --version
and bail out with a conspicuous failure message if it is earlier than 1.9.0. (The better safe than sorry approach.)--tags
option with something that will work on most Git releases still in production circulation today. Perhaps an explicit set of refspecs.Warning existing users of this problem may prove more difficult.
I am unsure which Git releases are distributed with Debian-derived LTS operating systems, but I do know that RHEL 7-derived operating systems ship with Git 1.8. Anyone running an enterprise distro is likely to get burnt here.
Release 0.13.0 added support for fetching a Github personal access token (PAT) that had been previously stored in the OSX Keychain. It would be good to extend this support to Windows and Linux also.
It looks like functionality to interact with system credential stores for various operating systems from Python already exists. For example the Python keyring package (Github repository) supports OSX Keychain, Windows Credential Vault, Freedesktop Secret Service (requires secretstorage) and KWallet (requires dbus).
By using something like keyring for the token storage, cross-platform support should be easy to add.
If we can support the system credential stores for the main operating systems, we could potentially streamline the user experience for provision of credentials. However, if we create a new user workflow for this, we need to ensure we don't get in the way of users who manage their credentials using git
itself (e.g. SSH keys or using a credential.helper
).
A potential option might be to deprecate the --keychain-name
and --keychain-account
arguments in favour of an argument that tells github-backup
to look in the system credential store for a PAT. This will preserve the existing behaviour of proceeding to call the Github API / git with no credentials if no credential-related arguments are passed to github-backup
. Two potential options for this are:
--use-system-keyring
)-t
token argument to interprest a special value as an instruction to use the system credential store (e.g. -t use-system-keyring
When the new argument is provided, a potential user interaction workflow might be:
github-backup
for the credential name and the Github username argument (-u
) as the username.@josegonzalez What are your thoughts on the above user interaction change suggestion?
When the rate limit is exceeded we sleep until the hourly requests are reset. After waking up the script prints the rate limit error ('No more requests remaining') then quits instead of resuming backup.
python 3.6
git 2.11
Running the following command:
github-backup.py -t oauthtoken -o backupPath --all -P -O orguser
it runs, downloads a repo then terminates with the below errors. if i add --skip-existing it downloads the next repo but then terminates with the same message. same behavior if i change --all to --repositories.
github-backup.py : Traceback (most recent call last):
At line:1 char:1
github-backup.py -t $OAuthToken -o $Backupfolder --repositories -P -O --skip-exi ...CategoryInfo : NotSpecified: (Traceback (most recent call last)::String) [], RemoteException FullyQualifiedErrorId : NativeCommandError File "C:\Program Files (x86)\Python36-32\Scripts\github-backup.py", line 815, in <module> main() File "C:\Program Files (x86)\Python36-32\Scripts\github-backup.py", line 810, in main backup_repositories(args, output_directory, repositories) File "C:\Program Files (x86)\Python36-32\Scripts\github-backup.py", line 518, in backup_repositories bare_clone=args.bare_clone) File "C:\Program Files (x86)\Python36-32\Scripts\github-backup.py", line 750, in fetch_repository logging_subprocess(git_command, None) File "C:\Program Files (x86)\Python36-32\Scripts\github-backup.py", line 89, in logging_subprocess check_io() File "C:\Program Files (x86)\Python36-32\Scripts\github-backup.py", line 79, in check_io 1000)[0] OSError: [WinError 10038] An operation was attempted on something that is not a socket
The github-backup package cannot be importable because of how its named. I think most of the code should be moved to the github_backup source code and the executable script file should be minimal. This will help with #49 and for writing code that calls this as a library.
Seems to work well for my personal account.
However we have an organization account (no username/password, no personal token)
I tried running my own account's token against it - but it found nothing.
I really was hoping to use this to back up our organization's account.
(Also, any plans for a bitbucket-backup? I assume it's probably quite similar.)
Hi, I am trying to backup an org that has public and private repos. I am using a service account with a token with privileges on public and private repos, but this line:
python-github-backup/bin/github-backup
Line 527 in f8be345
prevents the backup to run because org != user, a quick test commenting that line got it working but I'd like to understand the purpose of that check or if it will have some unwanted side effect.
If I run
~/.local/bin/github-backup -u xxxxxx@xxx XXX -o repos --all --private
From my account on an organisation where I have access to a set of private repos but have never forked or cloned anything I get a message like
Skipping YYY (https://xxxxx@xxx:*****@github.com/XXX/YYY.git) since it's not initalized
And the repos output with this message are not backed up however even so the script gives a success (0) return code.
Looking at the repos that I am backuing up they all have content and are initialised so I don't understand this message. I assume that it's not telling me that the target repository is not initialised since I'd assume it's meant to create that its self.
Can we restore the organisation to other account after successful backup. If yes please provide the process.
With #87 and #92 I realized that there is no API to get the starred gists of another user. This means the --starred-gists
will never work for the specified user, only the logged in user.
I'm curious what you think the solution here should be. I'm thinking:
--starred-gists
What would be nice if the incremental backups could detect force pushes or changes to tags and keep the abandonded trees around with a specific label. Maybe a reflog style feature? Or just generate well-known branches? This way we dont need to fear that backing up would lose old state (without having multiple full backups)
Command:
Printed extra output to highlight this:
[logged_in_user@host myorg-github-backup]$ pipenv run github-backup MYORG -u mtdeguzis --prefer-ssh --organization --private --output-directory=/hadoop/b
ackups/github/github-backup --all
Backing up user MYORG to /hadoop/backups/github/github-backup
Password:
Requesting https://api.github.com/user?per_page=100&page=1
Retrieving repositories
ARGS: Namespace(all_starred=False, bare_clone=False, fork=False, github_host=None, include_everything=True, include_followers=False, include_following=False, include_gists=False, include_hooks
=False, include_issue_comments=False, include_issue_events=False, include_issues=False, include_labels=False, include_milestones=False, include_pull_comments=False, include_pull_commits=False,
include_pull_details=False, include_pulls=False, include_repository=False, include_starred=False, include_starred_gists=False, include_watched=False, include_wiki=False, incremental=False, la
nguages=None, lfs_clone=False, name_regex=None, organization=True, osx_keychain_item_account=None, osx_keychain_item_name=None, output_directory='/hadoop/backups/github/github-backup', passwor
d='SOME_PASS', prefer_ssh=True, private=True, repository=None, skip_existing=False, token=None, user='MYORG', username='mtdeguzis')
USER: MYORG
AUTH USER: mtdeguzis
I printed out some extra info to show the issue.
Code block:
def retrieve_repositories(args, authenticated_user):
log_info('Retrieving repositories')
single_request = False
if args.user == authenticated_user['login']:
# we must use the /user/repos API to be able to access private repos
template = 'https://{0}/user/repos'.format(
get_github_api_host(args))
else:
if args.private:
log_error('Authenticated user is different from user being backed up, thus private repositories cannot be accessed')
template = 'https://{0}/users/{1}/repos'.format(
get_github_api_host(args),
args.user)
if args.organization:
template = 'https://{0}/orgs/{1}/repos'.format(
get_github_api_host(args),
args.user)
if args.repository:
single_request = True
template = 'https://{0}/repos/{1}/{2}'.format(
get_github_api_host(args),
args.user,
args.repository)
I believe the issue is with this line:
if args.user == authenticated_user['login']:
Should be:
if args.username == authenticated_user['login']:
Otherwise, the script assumes MYORG is the username. When I changed the check to args.username
, things ran as expected.
When running python-github-backup I get the following error:
Retrieving repositories
Traceback (most recent call last):
File "/home/cg/.local/bin/github-backup", line 666, in <module>
main()
File "/home/cg/.local/bin/github-backup", line 659, in main
repositories = retrieve_repositories(args)
File "/home/cg/.local/bin/github-backup", line 389, in retrieve_repositories
return retrieve_data(args, template, single_request=single_request)
File "/home/cg/.local/bin/github-backup", line 259, in retrieve_data
r, errors = _get_response(request, template)
File "/home/cg/.local/bin/github-backup", line 302, in _get_response
errors, should_continue = _request_http_error(exc, auth, errors) # noqa
NameError: global name 'auth' is not defined
My organization has a lot of GitHub data that we want to perform nightly backups of to a Drobo. I have been attempting to use this program to build it out, but I keep hitting the API rate limit, which times out the request for an increasing amount of time. Is there a way to tell the program to limit it's requests so that the data coming in is steady but not hitting the 5000 requests per minute threshold?
Hi
Is there some way to export a log-file at the end of a backup-job? I would like to send a report to a Slack channel through a WebHook.
Thanks
We are running the backup as a cron job on hourly basis. I've notived that the proces hangs sometimes.
Command: /usr/local/bin/github-backup -O -P -F --all --prefer-ssh -t <token> -o /home/info/github-backup <repo> > /dev/null
Attaching strace to the proces shows nothing:
sudo strace -p 19794
Process 19794 attached - interrupt to quit
read(4,
It doesn't show any further progress.
Do you need other information from the process? The process is still hanging on my server.
Hi, it would be great if we also created the directory (and intermediate directories) for an explicitly specified -o
path, just like the default does with creating the repositories
directory.
(ghbackup)jchen@rousseau> github-backup -t <token> --repositories -o fly fly
Specified output directory is not a directory: /Users/jchen/tmp/gh/fly
The backup works well, but in order to be a valid backup, there has to be some way to restore all that information back to GitHub. I know this is a monolithic request, possibly more than doubling the work put into the project so far. Just thought I'd put this out there since I didn't see any past issues about it.
When you don't have admin access to the repo (for example if you only have contributor access), you don't have the right to see hooks.
Currently that means getting a
Retrieving <user> hooks
API request returned HTTP 404: Not Found
and then the program exits
It should indeed display a warning, but it shouldn't stop there. (you might have the rights to get the hooks in some repo and not others)
If I remove a repository from GitHub, it might be desirable to remove its files from my backups as well. Usually these repos have no value anymore, or they are stored elsewhere.
So, similar to the rsync --delete
argument, not (anymore) existent repositories could be removed from the backup folder.
This would be a nice touch for contributors!
github-backup <NAME> --private --organization --output-directory ~/gitbak --all
Retrieving repositories
Filtering repositories
Backing up repositories
Retrieving <NAME> starred repositories
Writing 0 starred repositories to disk
Retrieving <NAME> watched repositories
Writing 0 watched repositories to disk
$ tree ~/gitbak/
/home/user/gitbak/
โโโ account
โโโ starred.json
โโโ watched.json
1 directory, 2 files
I would also be nice to have a --debug option for additional output of what is going on. I can clone our repos via ssh just fine. leaving out the ssh option did not make a difference.
I think it needs to be also changed at https://github.com/josegonzalez/python-github-backup/blob/master/bin/github-backup#L516 like it was for this https://github.com/josegonzalez/python-github-backup/pull/92/files.
kyan@elegiac ~/gh-2018oct05/3 nohup compiz --replace openbox $ github-backup OtherUser -t [redacted] --repositories
Backing up user OtherUser to /home/kyan/gh-2018oct05/3
Retrieving repositories
Requesting https://api.github.com/user/repos?per_page=100&page=1
Requesting https://api.github.com/user/repos?per_page=100&page=2
Requesting https://api.github.com/user/repos?per_page=100&page=3
Filtering repositories
Backing up repositories
kyan@elegiac ~/gh-2018oct05/3 nohup compiz --replace openbox $
There should be tests for this. It can start off as just a shell script that runs github-backup
to make sure it at least can display the help message without erroring. It can later be turned into unit tests but #50 needs to happen first.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.