Giter VIP home page Giter VIP logo

python-github-backup's Introduction

github-backup

PyPI Python Versions

The package can be used to backup an entire Github organization, repository or user account, including starred repos, issues and wikis in the most appropriate format (clones for wikis, json files for issues).

Requirements

  • GIT 1.9+
  • Python

Installation

Using PIP via PyPI:

pip install github-backup

Using PIP via Github (more likely the latest version):

pip install git+https://github.com/josegonzalez/python-github-backup.git#egg=github-backup

Install note for python newcomers:

Python scripts are unlikely to be included in your $PATH by default, this means it cannot be run directly in terminal with $ github-backup ..., you can either add python's install path to your environments $PATH or call the script directly e.g. using $ ~/.local/bin/github-backup.*

Basic Help

Show the CLI help output:

github-backup -h

CLI Help output:

github-backup [-h] [-u USERNAME] [-p PASSWORD] [-t TOKEN_CLASSIC]
              [-f TOKEN_FINE] [--as-app] [-o OUTPUT_DIRECTORY]
              [-l LOG_LEVEL] [-i] [--starred] [--all-starred]
              [--watched] [--followers] [--following] [--all] [--issues]
              [--issue-comments] [--issue-events] [--pulls]
              [--pull-comments] [--pull-commits] [--pull-details]
              [--labels] [--hooks] [--milestones] [--repositories]
              [--bare] [--lfs] [--wikis] [--gists] [--starred-gists]
              [--skip-archived] [--skip-existing] [-L [LANGUAGES ...]]
              [-N NAME_REGEX] [-H GITHUB_HOST] [-O] [-R REPOSITORY]
              [-P] [-F] [--prefer-ssh] [-v]
              [--keychain-name OSX_KEYCHAIN_ITEM_NAME]
              [--keychain-account OSX_KEYCHAIN_ITEM_ACCOUNT]
              [--releases] [--latest-releases NUMBER_OF_LATEST_RELEASES]
              [--skip-prerelease] [--assets]
              [--exclude [REPOSITORY [REPOSITORY ...]]
              [--throttle-limit THROTTLE_LIMIT] [--throttle-pause THROTTLE_PAUSE]
              USER

Backup a github account

positional arguments:
  USER                  github username

optional arguments:
  -h, --help            show this help message and exit
  -u USERNAME, --username USERNAME
                        username for basic auth
  -p PASSWORD, --password PASSWORD
                        password for basic auth. If a username is given but
                        not a password, the password will be prompted for.
  -f TOKEN_FINE, --token-fine TOKEN_FINE
                        fine-grained personal access token or path to token
                        (file://...)
  -t TOKEN_CLASSIC, --token TOKEN_CLASSIC
                        personal access, OAuth, or JSON Web token, or path to
                        token (file://...)
  --as-app              authenticate as github app instead of as a user.
  -o OUTPUT_DIRECTORY, --output-directory OUTPUT_DIRECTORY
                        directory at which to backup the repositories
  -l LOG_LEVEL, --log-level LOG_LEVEL
                        log level to use (default: info, possible levels:
                        debug, info, warning, error, critical)
  -i, --incremental     incremental backup
  --starred             include JSON output of starred repositories in backup
  --all-starred         include starred repositories in backup [*]
  --watched             include JSON output of watched repositories in backup
  --followers           include JSON output of followers in backup
  --following           include JSON output of following users in backup
  --all                 include everything in backup (not including [*])
  --issues              include issues in backup
  --issue-comments      include issue comments in backup
  --issue-events        include issue events in backup
  --pulls               include pull requests in backup
  --pull-comments       include pull request review comments in backup
  --pull-commits        include pull request commits in backup
  --pull-details        include more pull request details in backup [*]
  --labels              include labels in backup
  --hooks               include hooks in backup (works only when
                        authenticated)
  --milestones          include milestones in backup
  --repositories        include repository clone in backup
  --bare                clone bare repositories
  --lfs                 clone LFS repositories (requires Git LFS to be
                        installed, https://git-lfs.github.com) [*]
  --wikis               include wiki clone in backup
  --gists               include gists in backup [*]
  --starred-gists       include starred gists in backup [*]
  --skip-existing       skip project if a backup directory exists
  -L [LANGUAGES [LANGUAGES ...]], --languages [LANGUAGES [LANGUAGES ...]]
                        only allow these languages
  -N NAME_REGEX, --name-regex NAME_REGEX
                        python regex to match names against
  -H GITHUB_HOST, --github-host GITHUB_HOST
                        GitHub Enterprise hostname
  -O, --organization    whether or not this is an organization user
  -R REPOSITORY, --repository REPOSITORY
                        name of repository to limit backup to
  -P, --private         include private repositories [*]
  -F, --fork            include forked repositories [*]
  --prefer-ssh          Clone repositories using SSH instead of HTTPS
  -v, --version         show program's version number and exit
  --keychain-name OSX_KEYCHAIN_ITEM_NAME
                        OSX ONLY: name field of password item in OSX keychain
                        that holds the personal access or OAuth token
  --keychain-account OSX_KEYCHAIN_ITEM_ACCOUNT
                        OSX ONLY: account field of password item in OSX
                        keychain that holds the personal access or OAuth token
  --releases            include release information, not including assets or
                        binaries
  --latest-releases NUMBER_OF_LATEST_RELEASES
                        include certain number of the latest releases;
                        only applies if including releases
  --skip-prerelease     skip prerelease and draft versions; only applies if including releases
  --assets              include assets alongside release information; only
                        applies if including releases
  --exclude [REPOSITORY [REPOSITORY ...]]
                        names of repositories to exclude from backup.
  --throttle-limit THROTTLE_LIMIT
                        start throttling of GitHub API requests after this
                        amount of API requests remain
  --throttle-pause THROTTLE_PAUSE
                        wait this amount of seconds when API request
                        throttling is active (default: 30.0, requires
                        --throttle-limit to be set)

Usage Details

Authentication

Password-based authentication will fail if you have two-factor authentication enabled, and will be deprecated by 2023 EOY.

--username is used for basic password authentication and separate from the positional argument USER, which specifies the user account you wish to back up.

Classic tokens are slightly less secure as they provide very coarse-grained permissions.

If you need authentication for long-running backups (e.g. for a cron job) it is recommended to use fine-grained personal access token -f TOKEN_FINE.

Fine Tokens

You can "generate new token", choosing the repository scope by selecting specific repos or all repos. On Github this is under Settings -> Developer Settings -> Personal access tokens -> Fine-grained Tokens

Customise the permissions for your use case, but for a personal account full backup you'll need to enable the following permissions:

User permissions: Read access to followers, starring, and watching.

Repository permissions: Read access to code, commit statuses, issues, metadata, pages, pull requests, and repository hooks.

Prefer SSH

If cloning repos is enabled with --repositories, --all-starred, --wikis, --gists, --starred-gists using the --prefer-ssh argument will use ssh for cloning the git repos, but all other connections will still use their own protocol, e.g. API requests for issues uses HTTPS.

To clone with SSH, you'll need SSH authentication setup as usual with Github, e.g. via SSH public and private keys.

Using the Keychain on Mac OSX

Note: On Mac OSX the token can be stored securely in the user's keychain. To do this:

  1. Open Keychain from "Applications -> Utilities -> Keychain Access"
  2. Add a new password item using "File -> New Password Item"
  3. Enter a name in the "Keychain Item Name" box. You must provide this name to github-backup using the --keychain-name argument.
  4. Enter an account name in the "Account Name" box, enter your Github username as set above. You must provide this name to github-backup using the --keychain-account argument.
  5. Enter your Github personal access token in the "Password" box

Note: When you run github-backup, you will be asked whether you want to allow "security" to use your confidential information stored in your keychain. You have two options:

  1. Allow: In this case you will need to click "Allow" each time you run github-backup
  2. Always Allow: In this case, you will not be asked for permission when you run github-backup in future. This is less secure, but is required if you want to schedule github-backup to run automatically

Github Rate-limit and Throttling

"github-backup" will automatically throttle itself based on feedback from the Github API.

Their API is usually rate-limited to 5000 calls per hour. The API will ask github-backup to pause until a specific time when the limit is reset again (at the start of the next hour). This continues until the backup is complete.

During a large backup, such as --all-starred, and on a fast connection this can result in (~20 min) pauses with bursts of API calls periodically maxing out the API limit. If this is not suitable it has been observed under real-world conditions that overriding the throttle with --throttle-limit 5000 --throttle-pause 0.6 provides a smooth rate across the hour, although a --throttle-pause 0.72 (3600 seconds [1 hour] / 5000 limit) is theoretically safer to prevent large rate-limit pauses.

About Git LFS

When you use the --lfs option, you will need to make sure you have Git LFS installed.

Instructions on how to do this can be found on https://git-lfs.github.com.

Run in Docker container

To run the tool in a Docker container use the following command:

sudo docker run --rm -v /path/to/backup:/data --name github-backup ghcr.io/josegonzalez/python-github-backup -o /data $OPTIONS $USER

Gotchas / Known-issues

All is not everything

The --all argument does not include; cloning private repos (-P, --private), cloning forks (-F, --fork) cloning starred repositories (--all-starred), --pull-details, cloning LFS repositories (--lfs), cloning gists (--starred-gists) or cloning starred gist repos (--starred-gists). See examples for more.

Cloning all starred size

Using the --all-starred argument to clone all starred repositories may use a large amount of storage space, especially if --all or more arguments are used. e.g. commonly starred repos can have tens of thousands of issues, many large assets and the repo itself etc. Consider just storing links to starred repos in JSON format with --starred.

Incremental Backup

Using (-i, --incremental) will only request new data from the API since the last run (successful or not). e.g. only request issues from the API since the last run.

This means any blocking errors on previous runs can cause a large amount of missing data in backups.

Known blocking errors

Some errors will block the backup run by exiting the script. e.g. receiving a 403 Forbidden error from the Github API.

If the incremental argument is used, this will result in the next backup only requesting API data since the last blocked/failed run. Potentially causing unexpected large amounts of missing data.

It's therefore recommended to only use the incremental argument if the output/result is being actively monitored, or complimented with periodic full non-incremental runs, to avoid unexpected missing data in a regular backup runs.

  1. Starred public repo hooks blocking

    Since the --all argument includes --hooks, if you use --all and --all-starred together to clone a users starred public repositories, the backup will likely error and block the backup continuing.

    This is due to needing the correct permission for --hooks on public repos.

  2. Releases blocking

    A known --releases (required for --assets) error will sometimes block the backup.

    If you're backing up a lot of repositories with releases e.g. an organisation or --all-starred. You may need to remove --releases (and therefore --assets) to complete a backup. Documented in issue 209.

"bare" is actually "mirror"

Using the bare clone argument (--bare) will actually call git's clone --mirror command. There's a subtle difference between bare and mirror clone.

From git docs "Compared to --bare, --mirror not only maps local branches of the source to local branches of the target, it maps all refs (including remote-tracking branches, notes etc.) and sets up a refspec configuration such that all these refs are overwritten by a git remote update in the target repository."

Starred gists vs starred repo behaviour

The starred normal repo cloning (--all-starred) argument stores starred repos separately to the users own repositories. However, using --starred-gists will store starred gists within the same directory as the users own gists --gists. Also, all gist repo directory names are IDs not the gist's name.

Skip existing on incomplete backups

The --skip-existing argument will skip a backup if the directory already exists, even if the backup in that directory failed (perhaps due to a blocking error). This may result in unexpected missing data in a regular backup.

Github Backup Examples

Backup all repositories, including private ones using a classic token:

export ACCESS_TOKEN=SOME-GITHUB-TOKEN
github-backup WhiteHouse --token $ACCESS_TOKEN --organization --output-directory /tmp/white-house --repositories --private

Use a fine-grained access token to backup a single organization repository with everything else (wiki, pull requests, comments, issues etc):

export FINE_ACCESS_TOKEN=SOME-GITHUB-TOKEN
ORGANIZATION=docker
REPO=cli
# e.g. [email protected]:docker/cli.git
github-backup $ORGANIZATION -P -f $FINE_ACCESS_TOKEN -o . --all -O -R $REPO

Quietly and incrementally backup useful Github user data (public and private repos with SSH) including; all issues, pulls, all public starred repos and gists (omitting "hooks", "releases" and therefore "assets" to prevent blocking). Great for a cron job.

export FINE_ACCESS_TOKEN=SOME-GITHUB-TOKEN
GH_USER=YOUR-GITHUB-USER

github-backup -f $FINE_ACCESS_TOKEN --prefer-ssh -o ~/github-backup/ -l error -P -i --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --repositories --wikis --releases --assets --pull-details --gists --starred-gists $GH_USER

Debug an error/block or incomplete backup into a temporary directory. Omit "incremental" to fill a previous incomplete backup.

export FINE_ACCESS_TOKEN=SOME-GITHUB-TOKEN
GH_USER=YOUR-GITHUB-USER

github-backup -f $FINE_ACCESS_TOKEN -o /tmp/github-backup/ -l debug -P --all-starred --starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --labels --milestones --repositories --wikis --releases --assets --pull-details --gists --starred-gists $GH_USER

Development

This project is considered feature complete for the primary maintainer @josegonzalez. If you would like a bugfix or enhancement, pull requests are welcome. Feel free to contact the maintainer for consulting estimates if you'd like to sponsor the work instead.

Contibuters

A huge thanks to all the contibuters!

contributors

Testing

This project currently contains no unit tests. To run linting:

pip install flake8
flake8 --ignore=E501

python-github-backup's People

Contributors

8ch9azbsfifz avatar acdha avatar actions-user avatar albertyw avatar alexmojaki avatar amaczuga avatar broleg5 avatar dale-primer-e avatar dependabot[bot] avatar eht16 avatar einsteinx2 avatar froggleston avatar gallofeliz avatar garymoon avatar globin avatar hozza avatar jmichel-cs avatar josegonzalez avatar kirill-gr avatar klaude avatar martintoreilly avatar ondkloss avatar paranerd avatar pieterclaerhout avatar remram44 avatar samanthaq avatar slibby avatar timm3 avatar whwright avatar zhymabekroman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-github-backup's Issues

--starred-gists will never work for the specified user

With #87 and #92 I realized that there is no API to get the starred gists of another user. This means the --starred-gists will never work for the specified user, only the logged in user.

I'm curious what you think the solution here should be. I'm thinking:

  1. Document this, maybe print a warning if you use --starred-gists
  2. Remove the feature

failing on windows

python 3.6
git 2.11

Running the following command:

github-backup.py -t oauthtoken -o backupPath --all -P -O orguser

it runs, downloads a repo then terminates with the below errors. if i add --skip-existing it downloads the next repo but then terminates with the same message. same behavior if i change --all to --repositories.

github-backup.py : Traceback (most recent call last):
At line:1 char:1
github-backup.py -t $OAuthToken -o $Backupfolder --repositories -P -O --skip-exi ...

     CategoryInfo          : NotSpecified: (Traceback (most recent call last)::String) [], RemoteException
     FullyQualifiedErrorId : NativeCommandError
 
  File "C:\Program Files (x86)\Python36-32\Scripts\github-backup.py", line 815, in <module>
    main()
  File "C:\Program Files (x86)\Python36-32\Scripts\github-backup.py", line 810, in main
    backup_repositories(args, output_directory, repositories)
  File "C:\Program Files (x86)\Python36-32\Scripts\github-backup.py", line 518, in backup_repositories
    bare_clone=args.bare_clone)
  File "C:\Program Files (x86)\Python36-32\Scripts\github-backup.py", line 750, in fetch_repository
    logging_subprocess(git_command, None)
  File "C:\Program Files (x86)\Python36-32\Scripts\github-backup.py", line 89, in logging_subprocess
    check_io()
  File "C:\Program Files (x86)\Python36-32\Scripts\github-backup.py", line 79, in check_io
    1000)[0]
OSError: [WinError 10038] An operation was attempted on something that is not a socket

Pull requests are stored twice

Since pull requests are also issues in Github, currently pull requests are downloaded twice, as issue and as pull request.
When retrieved as issue, there are some information missing from the API, if I checked correctly. But you can differentiate issues from pull requests in the API response by the key pull_request according to the docs at https://developer.github.com/v3/issues/.

I wonder if it makes sense to drop all issues from the response which have the pull_request key set. At least if retrieving pull requests separately is requested by the user.

If accepted, I could make PR.

Existing password item in OSX keychain not found

I followed all the steps in the doc on creating a osx keychain password item, but when I run the following command: github-backup $ORG --keychain-name ghbackup --keychain-account laghee -o ./data --issues --issue-events --milestones --labels -R $REPO

I keep getting this error:

No password item matching the provided name and account could be found in the osx keychain.

This is my password item:

screenshot 2018-07-02 13 07 34

screenshot 2018-07-02 13 00 51

I'm on MacOS HighSierra (10.13.5).

Is there some other field that needs to be modified that isn't mentioned in the docs?

Since I only need to run one backup at the moment, I directly entered my credentials with the --username and --token flags, and that worked fine. But if I ever had to run several or set a script to back up automatically ... this would be a problem.

Feature Request: Milestone and Label backup

Thank you for this tool. It is working great for me to backup repositories, wikis, and issues. Two things I noticed missing that prevent it from being a complete backup are lists of milestones and labels for a repository. I know this information is essentially contained within the issues, but it would be useful to have them in their own separate lists, especially to retain labels/milestones without associated issues.

The GitHub API has simple endpoints for each:

  • Milestones - GET /repos/:owner/:repo/milestones?state=all
  • Labels - GET /repos/:owner/:repo/labels

Thanks for your consideration!

Restore Option

Can we restore the organisation to other account after successful backup. If yes please provide the process.

Add cross platform credential store support

Description

Release 0.13.0 added support for fetching a Github personal access token (PAT) that had been previously stored in the OSX Keychain. It would be good to extend this support to Windows and Linux also.

Notes

Existing Python packages

It looks like functionality to interact with system credential stores for various operating systems from Python already exists. For example the Python keyring package (Github repository) supports OSX Keychain, Windows Credential Vault, Freedesktop Secret Service (requires secretstorage) and KWallet (requires dbus).

By using something like keyring for the token storage, cross-platform support should be easy to add.

Potential improvement to user credential provision

If we can support the system credential stores for the main operating systems, we could potentially streamline the user experience for provision of credentials. However, if we create a new user workflow for this, we need to ensure we don't get in the way of users who manage their credentials using git itself (e.g. SSH keys or using a credential.helper).

A potential option might be to deprecate the --keychain-name and --keychain-account arguments in favour of an argument that tells github-backup to look in the system credential store for a PAT. This will preserve the existing behaviour of proceeding to call the Github API / git with no credentials if no credential-related arguments are passed to github-backup. Two potential options for this are:

  1. Add a new boolean argument (e.g. --use-system-keyring)
  2. Extend the -t token argument to interprest a special value as an instruction to use the system credential store (e.g. -t use-system-keyring

When the new argument is provided, a potential user interaction workflow might be:

  • Automatically look for a default credential name + username pair in the system credential store. I would suggest using github-backup for the credential name and the Github username argument (-u) as the username.
  • If the credential exists, then retrieve the PAT.
  • If the credential does not exist, then prompt the user to enter a PAT at the command prompt and create a new credential using the default credential name + username so they will not be prompted for it next time.

@josegonzalez What are your thoughts on the above user interaction change suggestion?

NameError: global name 'auth' is not defined

When running python-github-backup I get the following error:

Retrieving repositories
Traceback (most recent call last):
  File "/home/cg/.local/bin/github-backup", line 666, in <module>
    main()
  File "/home/cg/.local/bin/github-backup", line 659, in main
    repositories = retrieve_repositories(args)
  File "/home/cg/.local/bin/github-backup", line 389, in retrieve_repositories
    return retrieve_data(args, template, single_request=single_request)
  File "/home/cg/.local/bin/github-backup", line 259, in retrieve_data
    r, errors = _get_response(request, template)
  File "/home/cg/.local/bin/github-backup", line 302, in _get_response
    errors, should_continue = _request_http_error(exc, auth, errors)  # noqa
NameError: global name 'auth' is not defined

--incremental mode not working

Is the '--incremental' mode supposed to work? After performing a '--all'-backup, subsequent incremental backups do not work. Reproducibly with this error message. ([xxx] redacted by me)

Backing up user [xxx] to [xxx]
Retrieving repositories
Filtering repositories
Backing up repositories
Traceback (most recent call last):
File "/usr/local/bin/github-backup", line 834, in
main()
File "/usr/local/bin/github-backup", line 829, in main
backup_repositories(args, output_directory, repositories)
File "/usr/local/bin/github-backup", line 498, in backup_repositories
last_update = max(repository['updated_at'] for repository in repositories) # noqa
ValueError: max() arg is an empty sequence

I use Python 2.7.13.

Restoring backups

In issue #38 I read that this program doesn't include anything to restore the repository. Are any attempts made in creating such a script (by 3rd parties perhaps) or is it expected to be added to this program in the future?

No such file or directory

After installing github-backup via pip install github-backup or this git clone ...
bash: /usr/bin/github-backup: No such file or directory

Distribution: Ubuntu 16.04

Maybe it;s cos before i installed another tool named github-backup but a little older

Fixed by
cp /usr/local/bin/github-backup /usr/bin/github-backup

Question - how to backup a private repository from an organisation.

I, i'm a owner of an organization : [(https://github.com/CEDIA-models)] and I want to backup issues from a single private repository.

My idea was to log with my account(francoislauger) and acces the organization with directory :
github-backup -u USER francoislauger -O -P -R CEDIA-models/REPO --issues --issue-comments
I get a prompt for a password but after that I get that

Requesting https://api.github.com/user?per_page=100&page=1
API request returned HTTP 403: Forbidden

I'm sure i'm doing something wrong but not sure what.

Backup support for organization accounts?

Seems to work well for my personal account.

However we have an organization account (no username/password, no personal token)

I tried running my own account's token against it - but it found nothing.

I really was hoping to use this to back up our organization's account.

(Also, any plans for a bitbucket-backup? I assume it's probably quite similar.)

Skipping YYY (https://xxxxx@xxx:*****@github.com/XXX/YYY.git) since it's not initalized

If I run

~/.local/bin/github-backup -u xxxxxx@xxx XXX -o repos  --all --private

From my account on an organisation where I have access to a set of private repos but have never forked or cloned anything I get a message like

Skipping YYY (https://xxxxx@xxx:*****@github.com/XXX/YYY.git) since it's not initalized

And the repos output with this message are not backed up however even so the script gives a success (0) return code.

Looking at the repos that I am backuing up they all have content and are initialised so I don't understand this message. I assume that it's not telling me that the target repository is not initialised since I'd assume it's meant to create that its self.

Does not select any repositories of another user's repositories...

I think it needs to be also changed at https://github.com/josegonzalez/python-github-backup/blob/master/bin/github-backup#L516 like it was for this https://github.com/josegonzalez/python-github-backup/pull/92/files.

kyan@elegiac ~/gh-2018oct05/3 nohup compiz --replace openbox $ github-backup OtherUser -t [redacted] --repositories
Backing up user OtherUser to /home/kyan/gh-2018oct05/3
Retrieving repositories
Requesting https://api.github.com/user/repos?per_page=100&page=1
Requesting https://api.github.com/user/repos?per_page=100&page=2
Requesting https://api.github.com/user/repos?per_page=100&page=3
Filtering repositories
Backing up repositories
kyan@elegiac ~/gh-2018oct05/3 nohup compiz --replace openbox $ 

Unable to restore issues and wiki

I took a backup of one of my repository which container source files, issues and wiki.
I used github-backup to backup my repository. I restored the same repository by pushing the backup
repository to origin master. I got my source files, but not issues and wiki.

Can anyone help me with this ?

Remove non-existing Repos from Backup Folder, --delete

If I remove a repository from GitHub, it might be desirable to remove its files from my backups as well. Usually these repos have no value anymore, or they are stored elsewhere.

So, similar to the rsync --delete argument, not (anymore) existent repositories could be removed from the backup folder.

Calls always timing out

I was having an issue running the code. It's was a silly system config error I ran into on a new dev environment. Since code throws a misleading I'm putting the issue here in case other people happen to have the issue.

TLDR:

If you keep seeing this

https://api.github.com/user timed out
https://api.github.com/user timed out
https://api.github.com/user timed out
https://api.github.com/user timed out

but know you are connected to the network

Then run

cd /Applications/Python\ 3.7/
./Install\ Certificates.command

(change 3.7 to whatever version of python 3 you are using)

More detail:

Python 3.7 does not rely on MacOS' openSSL anymore. It comes with its own openSSL bundled and doesn't have access on MacOS' root certificates.

in _get_response() the detail of the URLError is swallowed up, and leads you to believe you call is timing out

except URLError:
            should_continue = _request_url_error(template, retry_timeout)
            if not should_continue:
                raise

I logged the error

except URLError as e:
            log_error(e.reason)
            should_continue = _request_url_error(template, retry_timeout)
            if not should_continue:
                raise

and saw
[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed
I ran the above commands and everything is working

Thanks to the post here that set me straight
https://stackoverflow.com/questions/40684543/how-to-make-python-use-ca-certificates-from-mac-os-truststore

--all option does not include everything

Thanks for the tool it's excellent ๐Ÿ‘

image

Just got confused by the description of --all and the meaning of "everything" that seem to be very user-centric as I had to add flags to get the forked and watched repos.

Other than that all went well and worked like a charm.
Thanks again for the good work!

Wrong user's starred repositories downloaded

Hi! I didn't let it get very far before killing it, but when I ran the command like this:

github-backup -u ethus3h -t [token] --all --repositories -P -F --gists --starred-gists --hooks --milestones --labels --bare --lfs --wikis --starred --all-starred --watched --followers --following --issues --issue-comments --issue-events --pulls --pull-comments --pull-commits --pull-details --hooks SomeoneElsesUsername

It started downloading all of my (ethus3h's) starred repositories, instead of the specified username's. I tried not including the -u and -t options, but it gave 401 Unauthorized.

This seems to prevent downloading the specified user's starred repositories.

It looks like this also affects starred gists.

Backup "watched" repositories

I thought the --watched include watched repositories in backup option would back up the watched repositories, but it actually only backup a list of watched repositories.

On github you often contribute to repository that are not in your own namespace and then watch them.
It would be useful to be able to backup all these repositories (as a new option for example)

Args assigns users as backup target after PR #97

Command:
Printed extra output to highlight this:

[logged_in_user@host myorg-github-backup]$ pipenv run github-backup MYORG -u mtdeguzis --prefer-ssh --organization --private --output-directory=/hadoop/b
ackups/github/github-backup --all
Backing up user MYORG to /hadoop/backups/github/github-backup
Password: 
Requesting https://api.github.com/user?per_page=100&page=1
Retrieving repositories
ARGS: Namespace(all_starred=False, bare_clone=False, fork=False, github_host=None, include_everything=True, include_followers=False, include_following=False, include_gists=False, include_hooks
=False, include_issue_comments=False, include_issue_events=False, include_issues=False, include_labels=False, include_milestones=False, include_pull_comments=False, include_pull_commits=False,
 include_pull_details=False, include_pulls=False, include_repository=False, include_starred=False, include_starred_gists=False, include_watched=False, include_wiki=False, incremental=False, la
nguages=None, lfs_clone=False, name_regex=None, organization=True, osx_keychain_item_account=None, osx_keychain_item_name=None, output_directory='/hadoop/backups/github/github-backup', passwor
d='SOME_PASS', prefer_ssh=True, private=True, repository=None, skip_existing=False, token=None, user='MYORG', username='mtdeguzis')
USER: MYORG
AUTH USER: mtdeguzis

I printed out some extra info to show the issue.

Code block:

def retrieve_repositories(args, authenticated_user):
    log_info('Retrieving repositories')
    single_request = False
    if args.user == authenticated_user['login']:
        # we must use the /user/repos API to be able to access private repos
        template = 'https://{0}/user/repos'.format(
            get_github_api_host(args))
    else:
        if args.private:
            log_error('Authenticated user is different from user being backed up, thus private repositories cannot be accessed')
        template = 'https://{0}/users/{1}/repos'.format(
            get_github_api_host(args),
            args.user)

    if args.organization:
        template = 'https://{0}/orgs/{1}/repos'.format(
            get_github_api_host(args),
            args.user)

    if args.repository:
        single_request = True
        template = 'https://{0}/repos/{1}/{2}'.format(
            get_github_api_host(args),
            args.user,
            args.repository)

I believe the issue is with this line:

    if args.user == authenticated_user['login']:

Should be:

    if args.username == authenticated_user['login']:

Otherwise, the script assumes MYORG is the username. When I changed the check to args.username, things ran as expected.

Update clone if local repository already exists

Upon re-running the script on the same output directory, with these options --all --all-starred --private --fork --bare --incremental, I noticed that for repositories that already existed, git would return error 128.

I checked the error myself by running the same command in a separate terminal window, and noticed that git was failing to clone because "the destination path already exists and is not an empty directory".

Is there any way to have the script run git remote update for already existing repositories? Or, is there any way to force git into clobbering the directory (though this might be less efficient than pulling just the changes, since it would be cloning everything from scratch)?

Refactor to be more easily importable

The github-backup package cannot be importable because of how its named. I think most of the code should be moved to the github_backup source code and the executable script file should be minimal. This will help with #49 and for writing code that calls this as a library.

Possible silent data loss with Git < 1.9.0 on git fetch

Hello,

python-github-backup invokes git fetch with the --tags option. The meaning of the --tags option changed in Git 1.9.0:

The meanings of the "--tags" option to "git fetch" has changed; the command fetches tags in addition to what is fetched by the same command line without the option.

What you have now should produce expected results on later Git releases, but any of your users with earlier Git releases may unknowingly be relying upon incomplete backups.

On these earlier Git releases, git fetch --tags will only fetch objects reachable from tag refs, not branch refs. In a repo with no tags, this means that nothing will be fetched. This is particularly dangerous because the backup will look hunky dory at the time the user first experiments with python-github-backup (the initial git clone will fetch everything) and because no errors will be reported from then on. (To be sure, this is not an erroneous condition from Git's perspective, hence, no errors.)

This is easy enough to fix for future users. Some options I can think of:

  • Stick a prominent caveat in your README with a suggested Git version requirement.
  • Modify python-github-backup to parse a version string from git --version and bail out with a conspicuous failure message if it is earlier than 1.9.0. (The better safe than sorry approach.)
  • Replace the --tags option with something that will work on most Git releases still in production circulation today. Perhaps an explicit set of refspecs.

Warning existing users of this problem may prove more difficult.

I am unsure which Git releases are distributed with Debian-derived LTS operating systems, but I do know that RHEL 7-derived operating systems ship with Git 1.8. Anyone running an enterprise distro is likely to get burnt here.

create missing directories for specified output path

Hi, it would be great if we also created the directory (and intermediate directories) for an explicitly specified -o path, just like the default does with creating the repositories directory.

(ghbackup)jchen@rousseau> github-backup -t <token> --repositories -o fly fly
Specified output directory is not a directory: /Users/jchen/tmp/gh/fly

How to prevent hitting API call rate limit

My organization has a lot of GitHub data that we want to perform nightly backups of to a Drobo. I have been attempting to use this program to build it out, but I keep hitting the API rate limit, which times out the request for an increasing amount of time. Is there a way to tell the program to limit it's requests so that the data coming in is steady but not hitting the 5000 requests per minute threshold?

backing up private repos on org with different user.

Hi, I am trying to backup an org that has public and private repos. I am using a service account with a token with privileges on public and private repos, but this line:

if args.private:

prevents the backup to run because org != user, a quick test commenting that line got it working but I'd like to understand the purpose of that check or if it will have some unwanted side effect.

Backup sometimes hangs

We are running the backup as a cron job on hourly basis. I've notived that the proces hangs sometimes.

Command: /usr/local/bin/github-backup -O -P -F --all --prefer-ssh -t <token> -o /home/info/github-backup <repo> > /dev/null

Attaching strace to the proces shows nothing:

sudo strace -p 19794 
Process 19794 attached - interrupt to quit
read(4,

It doesn't show any further progress.

Do you need other information from the process? The process is still hanging on my server.

Skipping .. since it's not initialized?

I am parsing the readme, but only saw this PR: #11. What is the criteria here? Why won't the repos clone? They are definitely there an we use them everyday with active code.

Skipping name ([email protected]:ORG/name.git) since it's not initialized

I know it exists fine:

$ git ls-remote https://github.com/ORG/name.git
Username for 'https://github.com': name
Password for 'https://[email protected]':
<hash>        HEAD
<hash>refs/heads/master
<hash>refs/remotes/origin/HEAD
<hash>refs/remotes/origin/master

Backuping up multiple repositories but not the whole account

I'd like to back up multiple repositories from my account, but not the whole thing, as I have forked various big repositories I just made some small (or no) contributions to. I found no way to to this out of the box, as it is only possible to select a single repository for backup.
Calling the program multiple times with different repositories and the same backup directory does not work either, as the last_update state is not tracked per repository, but only once in the backup directory root.

Would you agree on fixing this, either by making it possible to give multiple repositories to backup, or by moving the last_update state into the repository directory in the backup dir (or preferably both)? I'd be willing to provide pull requests for at least a part of the proposed changes.

Quits after resuming from rate limit sleep

When the rate limit is exceeded we sleep until the hourly requests are reset. After waking up the script prints the rate limit error ('No more requests remaining') then quits instead of resuming backup.

Backup of private organization does nothing, no errors

github-backup <NAME> --private --organization --output-directory ~/gitbak --all

Retrieving repositories
Filtering repositories
Backing up repositories
Retrieving <NAME> starred repositories
Writing 0 starred repositories to disk
Retrieving <NAME> watched repositories
Writing 0 watched repositories to disk

$ tree ~/gitbak/
/home/user/gitbak/
โ””โ”€โ”€ account
    โ”œโ”€โ”€ starred.json
    โ””โ”€โ”€ watched.json

1 directory, 2 files

I would also be nice to have a --debug option for additional output of what is going on. I can clone our repos via ssh just fine. leaving out the ssh option did not make a difference.

Does this support LFS clones?

We are using LFS (large-file support) in our repositories.

From our testing, we see that this is not supported by this script, correct?

If it's not supported, are you having any plans to add support for LFS clones?

not running on windows

because using select.select is only supported on sockets in windows and not on other streams (stderr for example)

Also clone starred repositories?

Having a JSON backup of what repositories are starred is fine, so long as the repositories themselves are never removed from Github. Sometimes they disappear so it would be useful to optionally clone all stars.

Is this beyond the scope of the project?

Thanks!

Is there a bare-bones working example somewhere?

I've been poking around trying to get this to work and I keep getting the "too few arguments" error. Digging into the source code, now, but a plain ol' example would be nice addition to the readme. Thanks.

Don't stop everything if hooks are not available

When you don't have admin access to the repo (for example if you only have contributor access), you don't have the right to see hooks.
Currently that means getting a

Retrieving <user> hooks
API request returned HTTP 404: Not Found

and then the program exits

It should indeed display a warning, but it shouldn't stop there. (you might have the rights to get the hooks in some repo and not others)

Logging / verbose / tail

Hi

Is there some way to export a log-file at the end of a backup-job? I would like to send a report to a Slack channel through a WebHook.

Thanks

Abandoned branches (force push) and changed labels

What would be nice if the incremental backups could detect force pushes or changes to tags and keep the abandonded trees around with a specific label. Maybe a reflog style feature? Or just generate well-known branches? This way we dont need to fear that backing up would lose old state (without having multiple full backups)

Feature Request: Restore

The backup works well, but in order to be a valid backup, there has to be some way to restore all that information back to GitHub. I know this is a monolithic request, possibly more than doubling the work put into the project so far. Just thought I'd put this out there since I didn't see any past issues about it.

NameError: global name 'arg' is not defined

Ran into a python error while testing backing up my organization's repositories.

# github-backup $ORGANIZATION --output-directory ./ --organization --token $GITHUB_ACCESS_TOKEN --all --private --fork --prefer-ssh --name-regex $REPO_NAME
Backing up user <REDACTED> to /root/github_backups
Retrieving repositories
Filtering repositories
Backing up repositories
Traceback (most recent call last):
  File "/usr/local/bin/github-backup", line 885, in <module>
    main()
  File "/usr/local/bin/github-backup", line 880, in main
    backup_repositories(args, output_directory, repositories)
  File "/usr/local/bin/github-backup", line 555, in backup_repositories
    lfs_clone=arg.lfs_clone)
NameError: global name 'arg' is not defined

It looks like there is a reference to an undefined variable in:

Probably just a typo arg instead of args. I tested changing this to args and the backup script worked great!

Add tests

There should be tests for this. It can start off as just a shell script that runs github-backup to make sure it at least can display the help message without erroring. It can later be turned into unit tests but #50 needs to happen first.

only 3 out of 80+ repos pulled down on an org

github-backup -u mikebz -p **** --all -O tempoautomation

the organization has about 80 repos. I ran the script with --all and -P and it only pulled a few down. Not sure what the issue is, but it doesn't seem to want to traverse the full set of repos.

Is there a trick to get this script to work?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.