Giter VIP home page Giter VIP logo

python-github-backup's Issues

--incremental mode not working

Is the '--incremental' mode supposed to work? After performing a '--all'-backup, subsequent incremental backups do not work. Reproducibly with this error message. ([xxx] redacted by me)

Backing up user [xxx] to [xxx]
Retrieving repositories
Filtering repositories
Backing up repositories
Traceback (most recent call last):
File "/usr/local/bin/github-backup", line 834, in
main()
File "/usr/local/bin/github-backup", line 829, in main
backup_repositories(args, output_directory, repositories)
File "/usr/local/bin/github-backup", line 498, in backup_repositories
last_update = max(repository['updated_at'] for repository in repositories) # noqa
ValueError: max() arg is an empty sequence

I use Python 2.7.13.

Calls always timing out

I was having an issue running the code. It's was a silly system config error I ran into on a new dev environment. Since code throws a misleading I'm putting the issue here in case other people happen to have the issue.

TLDR:

If you keep seeing this

https://api.github.com/user timed out
https://api.github.com/user timed out
https://api.github.com/user timed out
https://api.github.com/user timed out

but know you are connected to the network

Then run

cd /Applications/Python\ 3.7/
./Install\ Certificates.command

(change 3.7 to whatever version of python 3 you are using)

More detail:

Python 3.7 does not rely on MacOS' openSSL anymore. It comes with its own openSSL bundled and doesn't have access on MacOS' root certificates.

in _get_response() the detail of the URLError is swallowed up, and leads you to believe you call is timing out

except URLError:
            should_continue = _request_url_error(template, retry_timeout)
            if not should_continue:
                raise

I logged the error

except URLError as e:
            log_error(e.reason)
            should_continue = _request_url_error(template, retry_timeout)
            if not should_continue:
                raise

and saw
[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed
I ran the above commands and everything is working

Thanks to the post here that set me straight
https://stackoverflow.com/questions/40684543/how-to-make-python-use-ca-certificates-from-mac-os-truststore

--all option does not include everything

Thanks for the tool it's excellent ๐Ÿ‘

image

Just got confused by the description of --all and the meaning of "everything" that seem to be very user-centric as I had to add flags to get the forked and watched repos.

Other than that all went well and worked like a charm.
Thanks again for the good work!

Backup "watched" repositories

I thought the --watched include watched repositories in backup option would back up the watched repositories, but it actually only backup a list of watched repositories.

On github you often contribute to repository that are not in your own namespace and then watch them.
It would be useful to be able to backup all these repositories (as a new option for example)

Update clone if local repository already exists

Upon re-running the script on the same output directory, with these options --all --all-starred --private --fork --bare --incremental, I noticed that for repositories that already existed, git would return error 128.

I checked the error myself by running the same command in a separate terminal window, and noticed that git was failing to clone because "the destination path already exists and is not an empty directory".

Is there any way to have the script run git remote update for already existing repositories? Or, is there any way to force git into clobbering the directory (though this might be less efficient than pulling just the changes, since it would be cloning everything from scratch)?

Existing password item in OSX keychain not found

I followed all the steps in the doc on creating a osx keychain password item, but when I run the following command: github-backup $ORG --keychain-name ghbackup --keychain-account laghee -o ./data --issues --issue-events --milestones --labels -R $REPO

I keep getting this error:

No password item matching the provided name and account could be found in the osx keychain.

This is my password item:

screenshot 2018-07-02 13 07 34

screenshot 2018-07-02 13 00 51

I'm on MacOS HighSierra (10.13.5).

Is there some other field that needs to be modified that isn't mentioned in the docs?

Since I only need to run one backup at the moment, I directly entered my credentials with the --username and --token flags, and that worked fine. But if I ever had to run several or set a script to back up automatically ... this would be a problem.

Pull requests are stored twice

Since pull requests are also issues in Github, currently pull requests are downloaded twice, as issue and as pull request.
When retrieved as issue, there are some information missing from the API, if I checked correctly. But you can differentiate issues from pull requests in the API response by the key pull_request according to the docs at https://developer.github.com/v3/issues/.

I wonder if it makes sense to drop all issues from the response which have the pull_request key set. At least if retrieving pull requests separately is requested by the user.

If accepted, I could make PR.

NameError: global name 'arg' is not defined

Ran into a python error while testing backing up my organization's repositories.

# github-backup $ORGANIZATION --output-directory ./ --organization --token $GITHUB_ACCESS_TOKEN --all --private --fork --prefer-ssh --name-regex $REPO_NAME
Backing up user <REDACTED> to /root/github_backups
Retrieving repositories
Filtering repositories
Backing up repositories
Traceback (most recent call last):
  File "/usr/local/bin/github-backup", line 885, in <module>
    main()
  File "/usr/local/bin/github-backup", line 880, in main
    backup_repositories(args, output_directory, repositories)
  File "/usr/local/bin/github-backup", line 555, in backup_repositories
    lfs_clone=arg.lfs_clone)
NameError: global name 'arg' is not defined

It looks like there is a reference to an undefined variable in:

Probably just a typo arg instead of args. I tested changing this to args and the backup script worked great!

Skipping .. since it's not initialized?

I am parsing the readme, but only saw this PR: #11. What is the criteria here? Why won't the repos clone? They are definitely there an we use them everyday with active code.

Skipping name ([email protected]:ORG/name.git) since it's not initialized

I know it exists fine:

$ git ls-remote https://github.com/ORG/name.git
Username for 'https://github.com': name
Password for 'https://[email protected]':
<hash>        HEAD
<hash>refs/heads/master
<hash>refs/remotes/origin/HEAD
<hash>refs/remotes/origin/master

Unable to restore issues and wiki

I took a backup of one of my repository which container source files, issues and wiki.
I used github-backup to backup my repository. I restored the same repository by pushing the backup
repository to origin master. I got my source files, but not issues and wiki.

Can anyone help me with this ?

Does this support LFS clones?

We are using LFS (large-file support) in our repositories.

From our testing, we see that this is not supported by this script, correct?

If it's not supported, are you having any plans to add support for LFS clones?

Feature Request: Milestone and Label backup

Thank you for this tool. It is working great for me to backup repositories, wikis, and issues. Two things I noticed missing that prevent it from being a complete backup are lists of milestones and labels for a repository. I know this information is essentially contained within the issues, but it would be useful to have them in their own separate lists, especially to retain labels/milestones without associated issues.

The GitHub API has simple endpoints for each:

  • Milestones - GET /repos/:owner/:repo/milestones?state=all
  • Labels - GET /repos/:owner/:repo/labels

Thanks for your consideration!

Question - how to backup a private repository from an organisation.

I, i'm a owner of an organization : [(https://github.com/CEDIA-models)] and I want to backup issues from a single private repository.

My idea was to log with my account(francoislauger) and acces the organization with directory :
github-backup -u USER francoislauger -O -P -R CEDIA-models/REPO --issues --issue-comments
I get a prompt for a password but after that I get that

Requesting https://api.github.com/user?per_page=100&page=1
API request returned HTTP 403: Forbidden

I'm sure i'm doing something wrong but not sure what.

Is there a bare-bones working example somewhere?

I've been poking around trying to get this to work and I keep getting the "too few arguments" error. Digging into the source code, now, but a plain ol' example would be nice addition to the readme. Thanks.

No such file or directory

After installing github-backup via pip install github-backup or this git clone ...
bash: /usr/bin/github-backup: No such file or directory

Distribution: Ubuntu 16.04

Maybe it;s cos before i installed another tool named github-backup but a little older

Fixed by
cp /usr/local/bin/github-backup /usr/bin/github-backup

Wrong user's starred repositories downloaded

Hi! I didn't let it get very far before killing it, but when I ran the command like this:

github-backup -u ethus3h -t [token] --all --repositories -P -F --gists --starred-gists --hooks --milestones --labels --bare --lfs --wikis --starred --all-starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --pull-details --hooks SomeoneElsesUsername

It started downloading all of my (ethus3h's) starred repositories, instead of the specified username's. I tried not including the -u and -t options, but it gave 401 Unauthorized.

This seems to prevent downloading the specified user's starred repositories.

It looks like this also affects starred gists.

not running on windows

because using select.select is only supported on sockets in windows and not on other streams (stderr for example)

Backuping up multiple repositories but not the whole account

I'd like to back up multiple repositories from my account, but not the whole thing, as I have forked various big repositories I just made some small (or no) contributions to. I found no way to to this out of the box, as it is only possible to select a single repository for backup.
Calling the program multiple times with different repositories and the same backup directory does not work either, as the last_update state is not tracked per repository, but only once in the backup directory root.

Would you agree on fixing this, either by making it possible to give multiple repositories to backup, or by moving the last_update state into the repository directory in the backup dir (or preferably both)? I'd be willing to provide pull requests for at least a part of the proposed changes.

Restoring backups

In issue #38 I read that this program doesn't include anything to restore the repository. Are any attempts made in creating such a script (by 3rd parties perhaps) or is it expected to be added to this program in the future?

only 3 out of 80+ repos pulled down on an org

github-backup -u mikebz -p **** --all -O tempoautomation

the organization has about 80 repos. I ran the script with --all and -P and it only pulled a few down. Not sure what the issue is, but it doesn't seem to want to traverse the full set of repos.

Is there a trick to get this script to work?

Also clone starred repositories?

Having a JSON backup of what repositories are starred is fine, so long as the repositories themselves are never removed from Github. Sometimes they disappear so it would be useful to optionally clone all stars.

Is this beyond the scope of the project?

Thanks!

Possible silent data loss with Git < 1.9.0 on git fetch

Hello,

python-github-backup invokes git fetch with the --tags option. The meaning of the --tags option changed in Git 1.9.0:

The meanings of the "--tags" option to "git fetch" has changed; the command fetches tags in addition to what is fetched by the same command line without the option.

What you have now should produce expected results on later Git releases, but any of your users with earlier Git releases may unknowingly be relying upon incomplete backups.

On these earlier Git releases, git fetch --tags will only fetch objects reachable from tag refs, not branch refs. In a repo with no tags, this means that nothing will be fetched. This is particularly dangerous because the backup will look hunky dory at the time the user first experiments with python-github-backup (the initial git clone will fetch everything) and because no errors will be reported from then on. (To be sure, this is not an erroneous condition from Git's perspective, hence, no errors.)

This is easy enough to fix for future users. Some options I can think of:

  • Stick a prominent caveat in your README with a suggested Git version requirement.
  • Modify python-github-backup to parse a version string from git --version and bail out with a conspicuous failure message if it is earlier than 1.9.0. (The better safe than sorry approach.)
  • Replace the --tags option with something that will work on most Git releases still in production circulation today. Perhaps an explicit set of refspecs.

Warning existing users of this problem may prove more difficult.

I am unsure which Git releases are distributed with Debian-derived LTS operating systems, but I do know that RHEL 7-derived operating systems ship with Git 1.8. Anyone running an enterprise distro is likely to get burnt here.

Add cross platform credential store support

Description

Release 0.13.0 added support for fetching a Github personal access token (PAT) that had been previously stored in the OSX Keychain. It would be good to extend this support to Windows and Linux also.

Notes

Existing Python packages

It looks like functionality to interact with system credential stores for various operating systems from Python already exists. For example the Python keyring package (Github repository) supports OSX Keychain, Windows Credential Vault, Freedesktop Secret Service (requires secretstorage) and KWallet (requires dbus).

By using something like keyring for the token storage, cross-platform support should be easy to add.

Potential improvement to user credential provision

If we can support the system credential stores for the main operating systems, we could potentially streamline the user experience for provision of credentials. However, if we create a new user workflow for this, we need to ensure we don't get in the way of users who manage their credentials using git itself (e.g. SSH keys or using a credential.helper).

A potential option might be to deprecate the --keychain-name and --keychain-account arguments in favour of an argument that tells github-backup to look in the system credential store for a PAT. This will preserve the existing behaviour of proceeding to call the Github API / git with no credentials if no credential-related arguments are passed to github-backup. Two potential options for this are:

  1. Add a new boolean argument (e.g. --use-system-keyring)
  2. Extend the -t token argument to interprest a special value as an instruction to use the system credential store (e.g. -t use-system-keyring

When the new argument is provided, a potential user interaction workflow might be:

  • Automatically look for a default credential name + username pair in the system credential store. I would suggest using github-backup for the credential name and the Github username argument (-u) as the username.
  • If the credential exists, then retrieve the PAT.
  • If the credential does not exist, then prompt the user to enter a PAT at the command prompt and create a new credential using the default credential name + username so they will not be prompted for it next time.

@josegonzalez What are your thoughts on the above user interaction change suggestion?

Quits after resuming from rate limit sleep

When the rate limit is exceeded we sleep until the hourly requests are reset. After waking up the script prints the rate limit error ('No more requests remaining') then quits instead of resuming backup.

failing on windows

python 3.6
git 2.11

Running the following command:

github-backup.py -t oauthtoken -o backupPath --all -P -O orguser

it runs, downloads a repo then terminates with the below errors. if i add --skip-existing it downloads the next repo but then terminates with the same message. same behavior if i change --all to --repositories.

github-backup.py : Traceback (most recent call last):
At line:1 char:1
github-backup.py -t $OAuthToken -o $Backupfolder --repositories -P -O --skip-exi ...

     CategoryInfo          : NotSpecified: (Traceback (most recent call last)::String) [], RemoteException
     FullyQualifiedErrorId : NativeCommandError
 
  File "C:\Program Files (x86)\Python36-32\Scripts\github-backup.py", line 815, in <module>
    main()
  File "C:\Program Files (x86)\Python36-32\Scripts\github-backup.py", line 810, in main
    backup_repositories(args, output_directory, repositories)
  File "C:\Program Files (x86)\Python36-32\Scripts\github-backup.py", line 518, in backup_repositories
    bare_clone=args.bare_clone)
  File "C:\Program Files (x86)\Python36-32\Scripts\github-backup.py", line 750, in fetch_repository
    logging_subprocess(git_command, None)
  File "C:\Program Files (x86)\Python36-32\Scripts\github-backup.py", line 89, in logging_subprocess
    check_io()
  File "C:\Program Files (x86)\Python36-32\Scripts\github-backup.py", line 79, in check_io
    1000)[0]
OSError: [WinError 10038] An operation was attempted on something that is not a socket

Refactor to be more easily importable

The github-backup package cannot be importable because of how its named. I think most of the code should be moved to the github_backup source code and the executable script file should be minimal. This will help with #49 and for writing code that calls this as a library.

Backup support for organization accounts?

Seems to work well for my personal account.

However we have an organization account (no username/password, no personal token)

I tried running my own account's token against it - but it found nothing.

I really was hoping to use this to back up our organization's account.

(Also, any plans for a bitbucket-backup? I assume it's probably quite similar.)

backing up private repos on org with different user.

Hi, I am trying to backup an org that has public and private repos. I am using a service account with a token with privileges on public and private repos, but this line:

if args.private:

prevents the backup to run because org != user, a quick test commenting that line got it working but I'd like to understand the purpose of that check or if it will have some unwanted side effect.

Skipping YYY (https://xxxxx@xxx:*****@github.com/XXX/YYY.git) since it's not initalized

If I run

~/.local/bin/github-backup -u xxxxxx@xxx XXX -o repos  --all --private

From my account on an organisation where I have access to a set of private repos but have never forked or cloned anything I get a message like

Skipping YYY (https://xxxxx@xxx:*****@github.com/XXX/YYY.git) since it's not initalized

And the repos output with this message are not backed up however even so the script gives a success (0) return code.

Looking at the repos that I am backuing up they all have content and are initialised so I don't understand this message. I assume that it's not telling me that the target repository is not initialised since I'd assume it's meant to create that its self.

Restore Option

Can we restore the organisation to other account after successful backup. If yes please provide the process.

--starred-gists will never work for the specified user

With #87 and #92 I realized that there is no API to get the starred gists of another user. This means the --starred-gists will never work for the specified user, only the logged in user.

I'm curious what you think the solution here should be. I'm thinking:

  1. Document this, maybe print a warning if you use --starred-gists
  2. Remove the feature

Abandoned branches (force push) and changed labels

What would be nice if the incremental backups could detect force pushes or changes to tags and keep the abandonded trees around with a specific label. Maybe a reflog style feature? Or just generate well-known branches? This way we dont need to fear that backing up would lose old state (without having multiple full backups)

Args assigns users as backup target after PR #97

Command:
Printed extra output to highlight this:

[logged_in_user@host myorg-github-backup]$ pipenv run github-backup MYORG -u mtdeguzis --prefer-ssh --organization --private --output-directory=/hadoop/b
ackups/github/github-backup --all
Backing up user MYORG to /hadoop/backups/github/github-backup
Password: 
Requesting https://api.github.com/user?per_page=100&page=1
Retrieving repositories
ARGS: Namespace(all_starred=False, bare_clone=False, fork=False, github_host=None, include_everything=True, include_followers=False, include_following=False, include_gists=False, include_hooks
=False, include_issue_comments=False, include_issue_events=False, include_issues=False, include_labels=False, include_milestones=False, include_pull_comments=False, include_pull_commits=False,
 include_pull_details=False, include_pulls=False, include_repository=False, include_starred=False, include_starred_gists=False, include_watched=False, include_wiki=False, incremental=False, la
nguages=None, lfs_clone=False, name_regex=None, organization=True, osx_keychain_item_account=None, osx_keychain_item_name=None, output_directory='/hadoop/backups/github/github-backup', passwor
d='SOME_PASS', prefer_ssh=True, private=True, repository=None, skip_existing=False, token=None, user='MYORG', username='mtdeguzis')
USER: MYORG
AUTH USER: mtdeguzis

I printed out some extra info to show the issue.

Code block:

def retrieve_repositories(args, authenticated_user):
    log_info('Retrieving repositories')
    single_request = False
    if args.user == authenticated_user['login']:
        # we must use the /user/repos API to be able to access private repos
        template = 'https://{0}/user/repos'.format(
            get_github_api_host(args))
    else:
        if args.private:
            log_error('Authenticated user is different from user being backed up, thus private repositories cannot be accessed')
        template = 'https://{0}/users/{1}/repos'.format(
            get_github_api_host(args),
            args.user)

    if args.organization:
        template = 'https://{0}/orgs/{1}/repos'.format(
            get_github_api_host(args),
            args.user)

    if args.repository:
        single_request = True
        template = 'https://{0}/repos/{1}/{2}'.format(
            get_github_api_host(args),
            args.user,
            args.repository)

I believe the issue is with this line:

    if args.user == authenticated_user['login']:

Should be:

    if args.username == authenticated_user['login']:

Otherwise, the script assumes MYORG is the username. When I changed the check to args.username, things ran as expected.

NameError: global name 'auth' is not defined

When running python-github-backup I get the following error:

Retrieving repositories
Traceback (most recent call last):
  File "/home/cg/.local/bin/github-backup", line 666, in <module>
    main()
  File "/home/cg/.local/bin/github-backup", line 659, in main
    repositories = retrieve_repositories(args)
  File "/home/cg/.local/bin/github-backup", line 389, in retrieve_repositories
    return retrieve_data(args, template, single_request=single_request)
  File "/home/cg/.local/bin/github-backup", line 259, in retrieve_data
    r, errors = _get_response(request, template)
  File "/home/cg/.local/bin/github-backup", line 302, in _get_response
    errors, should_continue = _request_http_error(exc, auth, errors)  # noqa
NameError: global name 'auth' is not defined

How to prevent hitting API call rate limit

My organization has a lot of GitHub data that we want to perform nightly backups of to a Drobo. I have been attempting to use this program to build it out, but I keep hitting the API rate limit, which times out the request for an increasing amount of time. Is there a way to tell the program to limit it's requests so that the data coming in is steady but not hitting the 5000 requests per minute threshold?

Logging / verbose / tail

Hi

Is there some way to export a log-file at the end of a backup-job? I would like to send a report to a Slack channel through a WebHook.

Thanks

Backup sometimes hangs

We are running the backup as a cron job on hourly basis. I've notived that the proces hangs sometimes.

Command: /usr/local/bin/github-backup -O -P -F --all --prefer-ssh -t <token> -o /home/info/github-backup <repo> > /dev/null

Attaching strace to the proces shows nothing:

sudo strace -p 19794 
Process 19794 attached - interrupt to quit
read(4,

It doesn't show any further progress.

Do you need other information from the process? The process is still hanging on my server.

create missing directories for specified output path

Hi, it would be great if we also created the directory (and intermediate directories) for an explicitly specified -o path, just like the default does with creating the repositories directory.

(ghbackup)jchen@rousseau> github-backup -t <token> --repositories -o fly fly
Specified output directory is not a directory: /Users/jchen/tmp/gh/fly

Feature Request: Restore

The backup works well, but in order to be a valid backup, there has to be some way to restore all that information back to GitHub. I know this is a monolithic request, possibly more than doubling the work put into the project so far. Just thought I'd put this out there since I didn't see any past issues about it.

Don't stop everything if hooks are not available

When you don't have admin access to the repo (for example if you only have contributor access), you don't have the right to see hooks.
Currently that means getting a

Retrieving <user> hooks
API request returned HTTP 404: Not Found

and then the program exits

It should indeed display a warning, but it shouldn't stop there. (you might have the rights to get the hooks in some repo and not others)

Remove non-existing Repos from Backup Folder, --delete

If I remove a repository from GitHub, it might be desirable to remove its files from my backups as well. Usually these repos have no value anymore, or they are stored elsewhere.

So, similar to the rsync --delete argument, not (anymore) existent repositories could be removed from the backup folder.

Backup of private organization does nothing, no errors

github-backup <NAME> --private --organization --output-directory ~/gitbak --all

Retrieving repositories
Filtering repositories
Backing up repositories
Retrieving <NAME> starred repositories
Writing 0 starred repositories to disk
Retrieving <NAME> watched repositories
Writing 0 watched repositories to disk

$ tree ~/gitbak/
/home/user/gitbak/
โ””โ”€โ”€ account
    โ”œโ”€โ”€ starred.json
    โ””โ”€โ”€ watched.json

1 directory, 2 files

I would also be nice to have a --debug option for additional output of what is going on. I can clone our repos via ssh just fine. leaving out the ssh option did not make a difference.

Does not select any repositories of another user's repositories...

I think it needs to be also changed at https://github.com/josegonzalez/python-github-backup/blob/master/bin/github-backup#L516 like it was for this https://github.com/josegonzalez/python-github-backup/pull/92/files.

kyan@elegiac ~/gh-2018oct05/3 nohup compiz --replace openbox $ github-backup OtherUser -t [redacted] --repositories
Backing up user OtherUser to /home/kyan/gh-2018oct05/3
Retrieving repositories
Requesting https://api.github.com/user/repos?per_page=100&page=1
Requesting https://api.github.com/user/repos?per_page=100&page=2
Requesting https://api.github.com/user/repos?per_page=100&page=3
Filtering repositories
Backing up repositories
kyan@elegiac ~/gh-2018oct05/3 nohup compiz --replace openbox $ 

Add tests

There should be tests for this. It can start off as just a shell script that runs github-backup to make sure it at least can display the help message without erroring. It can later be turned into unit tests but #50 needs to happen first.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.