Giter VIP home page Giter VIP logo

cdis-data-client's Introduction

gen3-client

Build Status GitHub release (latest SemVer)

gen3-client is a command-line tool for downloading, uploading, and submitting data files to and from a Gen3 data commons.

Read more about what it does and how to use it in the gen3-client user guide.

gen3-client is built on Cobra, a library providing a simple interface to create powerful modern CLI interfaces similar to git & go tools. Read more about Cobra here.

Installation

(The following instruction is for compiling and installing the gen3-client from source code. There are also binary executables can be found at here)

First, install Go and the Go tools if you have not already done so. Set up your workspace and your GOPATH.

Then:

go get -d github.com/uc-cdis/gen3-client
go install

TODO: Remove after GitHub repo is renamed For now, the above actually won't work because the GitHub repo needs to be renamed. Do this instead:

mkdir -p $GOPATH/src/github.com/uc-cdis
cd $GOPATH/src/github.com/uc-cdis
git clone [email protected]:uc-cdis/cdis-data-client.git
mv cdis-data-client gen3-client
cd gen3-client
go get -d ./...
go install .

Now you should have gen3-client successfully installed. For a comprehensive instruction on how to configure and use gen3-client for uploading / downloading object files, please refer to the gen3-client user guide.

Enabling New Gen3 Object Management API

Some Gen3 data commons support uploading files through the new Gen3 Object Management API.

NOTE: The service powering this API is sometimes referred to as our object "Shepherd"

To enable gen3-client to upload using the Gen3 Object Management API, pass the use-shepherd=true to gen3-client configure, e.g.:

$ gen3-client configure --profile=myprofile --cred=/path/to/cred --apiendpoint=https://example.com --use-shepherd=true

If this flag is set, the gen3-client will attempt to use the Gen3 Object Management API to upload files, falling back to Fence/Indexd in case of failure.

You may also need to configure the version of the Gen3 Object Management API that the client will interact with. This is set to a default of Gen3 Object Management API v2.0.0, but can be raised or lowered by passing the min-shepherd-version flag to gen3-client configure, e.g.:

$ gen3-client configure --profile=myprofile --cred=/path/to/cred --apiendpoint=https://example.com --use-shepherd=true --min-shepherd-version=1.3.0

Uploading Additional File Object Metadata to Gen3 Object Management API

The Gen3 Object Management API supports uploading additional public access file object metadata when uploading data files.

WARNING: Additional File Object Metadata is exposed publically and thus should not be controlled/sensitive data

You can upload file metadata using the gen3-client upload command with the --metadata flag. E.g.:

gen3-client upload --profile=my-profile --upload-path=/path/to/myfile.bam --metadata

This will tell gen3-client to look for a metadata file myfile_metadata.json in the same folder as myfile.bam. A metadata file should be located in the same folder as the file to be uploaded, and should be named [filename]_metadata.json.

The metadata file should be a JSON file in the format:

{
    "authz": ["/example/authz/resource"],
    "aliases": ["example_alias"],
    "metadata": {
        "any": {
            "arbitrary": ["json", "metadata"]
        }
    }
}

The aliases and metadata properties are optional. Some Gen3 data commons require the authz property to be specified in order to upload a data file.

If you do not know what authz to use, you can look at your Profile tab or /identity page of the Gen3 data commons you are uploading to. You will see a list of authz resources in the format /example/authz/resource: these are the authz resources you have access to.

cdis-data-client's People

Contributors

atharvar28 avatar avantol13 avatar bedfordwest avatar binamb avatar cterrazas2 avatar giangbui avatar haraprasadj avatar jacob50231 avatar m0nhawk avatar mfshao avatar michaellukowski avatar mpingram avatar nss10 avatar paulineribeyre avatar philloooo avatar rgschmitz1 avatar rnerella92 avatar ronaldhshi avatar thanh-nguyen-dang avatar themarcelor avatar vpsx avatar vzpgb avatar zflamig avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cdis-data-client's Issues

File transfer stalling from PDC

Hi,
I am trying to download files from PDC bionimbus (ICGC data) and downloading multiple files in parallel causes erratic stalling of file download after downloading only a few hundred Kb.
This happens consistently and often. This occurs on a computing cluster with no internet connectivity issues.

Second issue: Even downloading files one at a time causes them to stall after a few Gb on average. I can never download large files > 10 Gb.

PXD-2249 ⁃ File Downloads in Windows version

I get the following error when using the gen3-client in Windows command shell:

In this first example, I purposefully use the wrong profile ('bc') for trouble-shooting purposes:
C:\Users\cgmeyer>gen3-client download --profile bc --guid c2774501-e5c4-4acb-8b4b-23fe803205fb --file=./Documents/file_to_download.json
2018/10/29 12:49:35 Download error: The provided guid at url "https://data.braincommons.org/user/data/download/c2774501-e5c4-4acb-8b4b-23fe803205fb" is not found!

This second example is using a legitimate GUID and profile:

C:\Users\cgmeyer>gen3-client download --profile niaid --guid c2774501-e5c4-4acb-8b4b-23fe803205fb --file=./Documents/file_to_download.json
panic: invalid character '<' looking for beginning of value

goroutine 1 [running]:
github.com/uc-cdis/gen3-client/gen3-client/jwt.DecodeJsonFromString(0xc0422ea000, 0xcc3, 0x7cb9c0, 0xc04225e800, 0x400, 0xc042162240)
/home/travis/gopath/src/github.com/uc-cdis/gen3-client/gen3-client/jwt/utils.go:32 +0x106
github.com/uc-cdis/gen3-client/gen3-client/jwt.(*Functions).ParseFenceURLResponse(0xc042065c98, 0xc042162240, 0xc04200c440, 0x38, 0xc0422c2900, 0x41a)
/home/travis/gopath/src/github.com/uc-cdis/gen3-client/gen3-client/jwt/functions.go:104 +0x253
github.com/uc-cdis/gen3-client/gen3-client/jwt.(*Functions).DoRequestWithSignedHeader(0xc042065c98, 0xc04200e0e0, 0x5, 0x0, 0x0, 0xc04200c440, 0x38, 0xc04200c440, 0x38, 0xc04207c580, ...)
/home/travis/gopath/src/github.com/uc-cdis/gen3-client/gen3-client/jwt/functions.go:147 +0x455
github.com/uc-cdis/gen3-client/gen3-client/g3cmd.init.3.func1(0xc042114780, 0xc0420886e0, 0x0, 0x5)
/home/travis/gopath/src/github.com/uc-cdis/gen3-client/gen3-client/g3cmd/download.go:87 +0x17a
github.com/spf13/cobra.(*Command).execute(0xc042114780, 0xc042088690, 0x5, 0x5, 0xc042114780, 0xc042088690)
/home/travis/gopath/src/github.com/spf13/cobra/command.go:766 +0x2c8
github.com/spf13/cobra.(*Command).ExecuteC(0xac6e60, 0x1, 0x1, 0x87379d)
/home/travis/gopath/src/github.com/spf13/cobra/command.go:852 +0x311
github.com/spf13/cobra.(*Command).Execute(0xac6e60, 0xc042044058, 0x0)
/home/travis/gopath/src/github.com/spf13/cobra/command.go:800 +0x32
github.com/uc-cdis/gen3-client/gen3-client/g3cmd.Execute()
/home/travis/gopath/src/github.com/uc-cdis/gen3-client/gen3-client/g3cmd/root.go:27 +0x34
main.main()
/home/travis/gopath/src/github.com/uc-cdis/cdis-data-client/main.go:8 +0x27

C:\Users\cgmeyer>

PXD-2304 ⁃ Support regex for data upload

Users like to be able to submit files by type as batch, for example all the bam files under a folder, so they want to do gen3-client upload -filepath /folder/*.bam .

gen3-client update-new profile ${profileName} filepath /folder/*.txt

Check Authentication and Privileges

It would be nice to have a function that would return your data access privileges.

For example:
I just configured a profile "dcf" using credentials downloaded from my Profile page on the DCF commons. I could send this command:
gen3-client auth --profile=dcf

And get this returned upon successfully authenticating:

Programs:
DCF  [read, read-storage]
Projects:
DCF-CCLE [create, upload, delete, read, read-storage, write-storage]
ABC-xyz123 [read]
...

Get this returned if you have no access:

You don't currently have access to data from any projects at:
https://nci-crdc-demo.datacommons.io (or whatever you entered for --apiendpoint during configure)

Finally, get this returned if you entered bogus credentials or a bogus API endpoint:

Your profile 'dcf' is not configured correctly. Please, check that you've entered the correct API endpoint for your data commons, and ensure your credentials file has not expired.

This would help trouble-shoot data access issues with users.

hmac -> jwt

  • configure creds by passing a credentials.json downloaded from portal
    go configure --profile test --creds creds.json
  • request access_token using refresh_token, save it in the same profile config
  • take care of requesting new access_token when old one in config is expired
  • pass access_token in authorization header when calling gen3 apis

PXD-2479 ⁃ Provide link to datacommons.org/submission/files after upload

Would be nice if following a gen3-client upload command, the client provided a link to the file mapping. This is similar to the link you get when you do a git push origin repo like this:

remote: Create a pull request for 'feat/CMC' on GitHub by visiting:
remote:      https://github.com/uc-cdis/ibdgc-dictionary/pull/new/feat/CMC

The gen3-client could provide output like this (see last two lines):

gen3-client upload --profile=datacommons --upload-path=~/Documents/Notes/files/file.bam
Local history file "/Users/username/.gen3/datacommons.json" has opened

Begin parsing all file paths for "~/Documents/Notes/files/file.bam"
Finish parsing all file paths for "~/Documents/Notes/files/file.bam"

The following file(s) has been founded in path "~/Documents/Notes/files/file.bam" and will be uploaded:
	/Users/username/Documents/Notes/files/file.bam


Begin parsing all file paths for "/Users/username/.gen3/config"
Finish parsing all file paths for "/Users/username/.gen3/config"
Uploading data ...
file.bam  29 B / 29 B [========================================================================================================================================================] 100.00% 0s
Successfully uploaded file "/Users/username/Documents/files/file.bam" to GUID dg.0896/18d53086-1ce3-48b0-b685-a6c084a849cb.
Local history data updated in  /Users/username/.gen3/datacommons_history.json

To map your files to the appropriate nodes in the data model 
Please, visit: https://api-endpoint/submission/files/

Go CLI on top of aws go sdk for S3 functionalities

Use case:

  1. User provide hmac keypair to save as a profile locally
  2. Do cdis-data-client data download $uuid to download an object from S3. This will first call user-api to ask for a presigned-url then download the object with the presigned-url
  3. Do cdis-data-client data upload $uuid to upload an object from S3.
  4. Create a lambda listener to update url in signpost when an new object is uploaded

Support hmac4 authentication for talking to gdcapi

  1. support configuring credentials and configurations
    cdis-client configure --profile, similar to awscli configuration, it should prompt to ask for creds.
    The profile should be configured on
    ~/.cdis/config
    the structure should look like
[cdis]
access_key=XXX
secret_key=XXX
gdcapi_endpoint=https://website/
  1. support listing all projects

cdis-client list projects --profile test

PXD-2066 ⁃ fix help for configure: example shows 'creds' flag instead of 'cred'

[~/Documents/Notes/gen3/gen3-client$ ./gen3-client configure
Error: required flag(s) "cred" not set
Usage:
gen3-client configure [flags]

Examples:
./gen3-client configure --profile=user1 --creds creds.json

Flags:
--cred string Specify the credential file that you want to use
-h, --help help for configure

Global Flags:
--profile string Specify profile to add or edit with --profile user2 (default "default")

PXD-1158 ⁃ Use original filename in 'download' mode

Currently, a filename must be specified by the user in download mode with the '--file ' option. It would be nice if the '--file' option could be left off download commands, and when it's not specified, either the filename of the file in object storage would be used, or the 'filename' as specified in the metadata.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.