g-node / gin-cli Goto Github PK

View Code? Open in Web Editor NEW

12.0 7.0 6.0 1.32 MB

Command line client for GIN

Home Page: https://gin.g-node.org

License: Other

Go 91.36% Python 6.94% Batchfile 0.81% Shell 0.24% Makefile 0.56% Dockerfile 0.09%

gin-cli's Introduction

GIN-CLI

G-Node Infrastructure Command Line Client

This package is a command line client for interfacing with repositories hosted on GIN. It offers a simplified interface for downloading and uploading files from repositories hosted on GIN.

It consists of commands for interfacing with the GIN web API (e.g., listing repositories, creating repositories, managing SSH keys) but primarily, it wraps git and git-annex commands to make working with data repositories easier.

Information, setup, and guides

For installation instructions see the GIN Client Setup page.

General information, help, and guides for using GIN can be found on the GIN Info Wiki. Help and information for the client in particular can be on the following pages:

gin-cli's People

Contributors

Stargazers

Watchers

Forkers

achilleas-k mpsonntag cgars ajkswamy hkchekc nickdelgrosso

gin-cli's Issues

Host key verification failed

On new systems where the server's SSH host key has not been entered into the known_hosts, the git commands fail because the host key is unknown and the user is not prompted.

The SSH option is
ssh -o StrictHostKeyChecking=no [email protected]

Logout confirmation message

gin logout currently prints nothing (unless there's a problem). It should confirm logout.

Check space before download

Check if there is enough space on the drive before downloading.

Do not use agent key for different logged in user

Consider the following scenario:

Alice logs in using the client gin login alice
Alice uploads her key to gin-auth gin keys --add ~/.ssh/id_rsa.pub
Alice does some work using her key in her SSH agent
Alice logs out gin logout
Bob logs in on the same workstation gin login bob
Bob tries to perform a git transaction that Alice is not permitted to do gin get bob/bobs_data

In step 5, the following will occur:

The client will check to see if there's an SSH agent running
It will find Alice's SSH agent and key
It will use Alice's key to perform a git clone
When Alice's key is sent to gin-repo, the repo service will query gin-auth and auth will respond that the key belongs to Alice.
Bob will get an error that he is not authorised to download bob/bobs_data, because he is trying to authenticate using Alice's key.

Solution: Before using the key found in the agent, check with gin-auth that the currently logged in user (if any) matches the owner of the key.

Add license file to release archives and installers

Make client secret configurable

Support multiple server configurations

It would be useful to have a mechanism for easily switching between multiple server configurations.

Selective download and upload

By far the most requested feature is being able to selectively sync files to and from the server. This is particularly useful for working with very large repositories without cloning all the data locally.

Split out client library code

A new repository will be created for common code that will be shared between all user clients (e.g., gin-client-lib). This should contain most of the code found in this repository that is reusable, e.g., packages client, repo, auth.

The current repository, gin-cli, will then be a thin client on top of gin-client-lib which simply parses command line arguments, calls library functions, and handles output.

Other clients, like a desktop GUI or a filebrowser plugin, will also use gin-client-lib for all operations.

Create commits when getting files from remote

When an annexed file's contents are made available locally (git annex get file), a commit needs to be created (and pushed?) to register the new location of the file. Is this the proper way of handling this or does it detect modifications because of the WORM backend?

Handle merge conflicts

I can think of a few ways to handle merge conflicts:

Pick either the remote or the local file to be the main version (consistently) and rename the other one to indicate it is a conflicting file (dropbox style). I think this is the friendliest solution.
Rename both files with a timestamp of their last modification time. Not fun, since the user will be left with no file which carries the original name.
Ask the user what to do. Maybe this could give a few options: Let the user indicate which is the primary file and rename the other one (never delete), or allow them to cancel the operation.

Use external configuration file to define behaviour

A config file in $XDG_CONFIG_HOME/gin that can be used to configure the client's behaviour (e.g., host addresses for custom installations).

List of options:

Host address(es).
Key mechanism priority: file or agent.
File types that are added to git instead of annex.

Verbose flags

Add two verbose flags.

The standard -v or --verbose flag should print extended output.
A second one should also be added that prints the git and annex commands that are being executed (e.g., --git).

When both are used in combination, the output of the git commands should also be printed.

Library functions should not print output

Certain library functions, like GetRepos in repo/repo.go, print output. Anything that's in a subpackage should be a library function and so all console output should be handled by main.

We should consider how to handle partial progress status printing in these cases.

Use MIME type to decide between versioning or annexing a file

All data files should be annexed: binary formats, images, large files.
Text files (readmes, code, etc) should be committed to git directly and versioned.

Read annex.largefile value from external configuration file

Better error messages

The error messages for the following cases are puzzling or uninformative:

gin info <user> when <user> doesn't exist.
gin get <reponame> when <reponame> doesn't exist.
gin repos <user>: regardless of whether <user> exists or has no public/shared repositories, currently the message is the same (because the server returns 404). On 404, the client should check if <user> exists and return with appropriate feedback.

Repository functionality

Should support:

Create repository
Upload data to repository
Download data from repository

EDIT: Not going to implement editing repository info. Can be done on the web and just changing the description on the client is unnecessary. Closing.

Log Go 'err' as well as stdout and stderr when exec command fails

See #67 for example.

Use Ed25519 for temporary keys

Documentation

Document all options properly. Currently, all we have is the output of gin cmdhelp in the repo readme.
Advanced documentation with explanation of what the client does in the background.
Create tutorial for common use cases.
Add basic documentation into the packaged archives.

Status output for time-consuming operations

This isn't exactly like issue #7 (Progress Bars), but it is more about informing the user about what is happening, especially when an operation can take a while.

When working with very large files, some steps in an operation, like computing the hash when adding new files, can take a long time. A progress bar for data transfers wouldn't help here; there's a long computation occurring before the actual transfer begins. What we need is a message in the style of Preparing files for upload..., perhaps even a file count progress indicator.

Basically, we need anything that informs the user that something is happening, otherwise it looks like the program is hanging.

Progress bar or percentage for upload and download

Support annex v5

When v6 is not available, fall back to v5 and unlocked files or direct mode.

No upload with commited but non synced annexes

Just because there are no (local) changes does IMHO not necessarily mean that "everything" is uploaded (just consider. upload errors both permanent or more important temporary)
so i am not sure that
this is a sufficient test.

Error checking in repo.CloneRepo

Limit commit message length

When adding many (thousands of) files for upload, the commit message becomes long enough that it exceeds the limits of the command and the "annex push" command fails.

Download repository on create

Perhaps we should prompt the user for this, but it would probably be very convenient and low cost to do a clone after creating a repository on the server, especially considering the repository will be empty.

Create activity log

Make the client log all activities, while storing as little identifiable information as possible. The log should help in debugging during development, but also in troubleshooting client problems.

Use git annex in direct mode

Would make more sense to the user (their files would not be replaced by symlinks) and is also required for gin annex on Windows.

Make fewer annex calls during ls

Currently performs one per file.
This is bad.

Upload command does not add dotfiles

This is a tricky one.
Should we add dotfiles?

I might rethink this if I add a command that does git add exclusively.
Also under consideration is a command that performs git add $* && git commit ..., to record changes locally without necessarily uploading.

Command to make repository out of current directory

I imagine a scenario where a user has a directory with data and just wants to turn into a gin repository. We could have a command that creates a remote and instead of using gin get (git clone), the client performs git init, git annex init, sets defaults, adds the newly created repository as a remote, and begins tracking the current directory.

I'd like to think about this more and hear other people's thoughts on this though. One downside is that this function enables users to start using their primary data storage as a gin repository. Forcing users to create a separate directory to initialise as a gin repo and having them copy their data manually, also forces them to think about which data they want to push.

Write tests

gin get for clone

Separate the git clone command from the git pull command.

Currently they are both handled by gin download. To avoid ambiguity, cloning should be handled by a new command called gin get.

Windows config and token directory

With the current system, which tries to rely on XDG_CONFIG_HOME, on Windows the token is stored in the working directory.

Help for individual commands

People expect to be able to use

gin help keys

to see get help on the keys command.

Same for

gin keys --help

Would be nice to have both.

Add ability to add keys from the command line

Should prompt for a pubkey filename or accept filename on command line.

Create key pair and load user key from file for transactions

Currently loads directly from agent.
Eventually, the client should be able to generate its own key pair, upload the public key to the server, and directly load the private key from a local file.

Keys in an SSH agent could be used based on a global configuration setting.

Don't create unnecessary commits.

Pull should do a fast forward when possible without creating a new commit.
Uploading also creates a commit even when there are no changes to any of the files in the repository.

Configurable git and git annex binary paths

Ask users to add keys in documentation/tutorial

An interesting edge case:
A user has a tool (e.g., shell prompt) that relies on being able to speak to a git remote. If their key isn't set up with gin, and they rely on the temporary key handling in the client, this will cause issues (specifically, it will prompt for a password).

In the tutorial or the basic docs, we should have a section that explicitly asks users to add their keys. We should also tell users that don't have SSH keys or don't generally use git to ignore this section.

Descriptive errors

Some common, very specific errors give vague messages.
One in particular was caught a few times during the BrainHack workshop.

When a gin get is called and the directory already exists, the command fails with a vague error message. It would be trivial to distinguish this case and give an informative error message.

There are other errors like this that could use work.

UPDATE
Checklist of errors to handle:

Directory already exists
Permission denied on git command
File not found (e.g., on gin get-content)
Unable to unlock file with no content
Unable to lock, unlock, get-content, or remove-content of untracked or non-annexed files
Unable to run any git commands outside a git repository
Unable to run any command on file that does not exist
Not enough free space (git annex performs the check itself. Related issue #21)

Change behaviour of repos command

Current behaviour:

By default, it lists public repositories.
If username is specified, it lists the repositories owned by username and accessible by me (the logged in user). These can be either shared or public.

New behaviour:

By default, list my repositories.
Provide a flag that lists repositories shared with me.
Provide a flag that lists public repositories.
If username is specified, same as current behaviour.

Specify key is temporary when sending temporary keys to gin-auth

Currently does not specify and keys default to non-temporary.

Can't accept host key when using gin for the first time

Since the commands run by the command line client aren't interactive, the user isn't prompted to accept the host key of the gin repo server when the first git command is issued.

One way around this is to add -o StrictHostKeyChecking=no to all git{,-annex} commands.

Another way would be to see if we can detect the prompt and pass it on to the user. This wasn't an issue before since gin-cli didn't capture stdout. But with the newest changes in logging, this will change. I decided that capturing stdout is important for troubleshooting and that the user shouldn't be presented with the output of the backend commands.

We could also have a command that simply attempts to connect to the git server (similar repo.Connect()) and prompts the user to accept the key. I don't really like this last idea though.

Packaging and distribution

Need an automated way to package the client for different OSes and Linux distros (deb, rpm, etc).

Will need to figure out which distributions (and distribution versions) require bundling annex v6.

Add current user to commit messages

When doing a gin upload, consider using logged in user info to create the commit signature. Though this might break things for people who have a git configuration.

Perhaps a compromise could be to let the git configuration set the commit Author, but add the logged in user's info (username, real name) to the commit message.

Annex modes

I'm having trouble settling on the mode of operation for the client with respect to git annex. I'd like to describe all the cases and what they entail to see if anyone else has any idea how to best handle this.

Let me start with a summary of the situations and go into more detail below.

Mode	Advantages	Disadvantages	Notes
Indirect mode	Default mode. Safe.	Doesn't work on Windows (requires symlinks). Could be confusing for the user.	Although this is the current default, it seems to be heading towards being deprecated for the v6 mode
Direct mode	Closer to what a user unfamiliar with git-annex would expect.	Deprecated as of version 6. Unsafe, since the user is always working with "unlocked" files.	Although deprecated, I'm including this mode as we might consider it for compatibility with distributions that still run v5 of git-annex.
Version 6 unlocked	Best of both modes above. Files aren't symlinks but it's also not unsafe. Can use normal git commands and configuration options to commit files into annex.	Doubles the disk space usage for annexed files. Requires newest version of git-annex.
Version 6 unlocked thin	Resolves issue with double disk space usage.	Unsafe, since there is no unmodified copy in `.git/annex/objects` when working on a file.

My issue here is with trying to balance convenience for the user and data safety. The last mode, V6 thin, is pretty convenient: It behaves like most users would expect (e.g., no symlinks and unlocking required) and it doesn't use up the extra disk space.

Since I started writing this I've experimented with all modes and I have some ideas about how to use v6 unlocked thin mode while also making sure the user is always aware of the safety of their data.

In thin mode, there is no local backup of the last checked in version of the annexed file. A file's state is only backed up once an upload to a remote is complete. If we stick to this mode, we could provide commands (and in the future, GUI indicators) showing which of the files in the current working directory are in a backed up state. In other words, we can tell a user if their latest changes have been recorded on the gin servers.

I'm going to follow up this issue with some more details on the other modes and how we can handle various cases, but in the meantime feel free to add comments, suggestions, and share concerns.

Sources: