Giter VIP home page Giter VIP logo

dpm-js's Introduction



Data Package Manager - in JavaScript

NPM Package

Build Status

dpm is a library and command line manager for Data Packages.

Starting from v0.8.0 package on NPM has been renamed to dpmjs.

Install

dpm is implemented in node, so to install dpm just do:

npm install dpmjs -g

Command Line Usage

To get an overview and list of commands check out the command line help:

dpm --help

Using DPM programaticaly

You can also use dpm programatically.

var Dpm = require('dpmjs');
var dpm = new Dpm(conf);

dpm.install(['[email protected]', '[email protected]'], {cache: true}, function(err, dpkgs){
  //done!
});
dpm.on('log', console.log); //if you like stuff on stdout

Changelog

  • v0.8.0: renamed to dpmjs on NPM
  • v0.7.0: new ckan command
  • v0.6.0: much better validation via v0.2 of datapackage-validate

References

Previous dpm (python-based) can still be found at http://github.com/okfn/dpm-old.

dpm-js's People

Contributors

alvaropinot avatar mchelen avatar morty avatar ralphtheninja avatar roll avatar rufuspollock avatar sballesteros avatar stevage avatar waylonflinn avatar zhenyab avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dpm-js's Issues

dpm validate error

I can't manage to get a valid datapackage.json, not even running dpm init and then dpm validate, I always get:

DataPackage.json is Invalid
{
  "isFulfilled": false,
  "isRejected": false
}

Validation errors

I'm getting validation errors on a couple of fronts when doing dpm loads using the npm install v 0.7.6.

All errors are on OKFN core datapackages. Errors do not appear when I run the base CSV's through the http://datapackager.okfn.org web service.

First order errors are on metadata like source, maintainer, etc.

Second order are on the actual schemas contained in the core data packages.

Again, the data loads fine when I regenerate the datapackage.json file, removing associated metadata.

install (download/import) command

"install" (download/import) a data package onto disk (note integration in other apps is outside of scope here).

Motivating user story: I'm building an app, doing some analysis etc and I want to get a bunch of data from data packages into my project

dpm install {url}
dpm install {git-url / github url}
dpm install {pkg-name}           # if we have a registry

Questions

  • into my project could mean: a) data files on disk at a standard location b) into a database c) into my tool. Here we focus only on (a)
  • Do I want entire data package or just the data resources (?) - let's get it all
  • What about ones that are in git - should we clone? (Ans: yes???)
  • Where do we install to? Ans: datapackages/{datapackage-name}/...
  • How do we install from a URL? Answer: get the datapackage.json and then download the resources?
  • What happens if the dp.json has urls and no path? Download to local and set path to local location

Running "dpm init" on directory with a csv file errors out

Not sure what is going on here. See stack trace:

/Users/honeybadger/.nvm/versions/node/v0.12.5/lib/node_modules/datapm/node_modules/datapackage/datapackage.js:287
  var parser = csv();
               ^
TypeError: object is not a function
    at Object.exports.createJsonTableSchema (/Users/honeybadger/.nvm/versions/node/v0.12.5/lib/node_modules/datapm/node_modules/datapackage/datapackage.js:287:16)
    at Object.exports.createResourceEntry (/Users/honeybadger/.nvm/versions/node/v0.12.5/lib/node_modules/datapm/lib/util.js:45:17)
    at /Users/honeybadger/.nvm/versions/node/v0.12.5/lib/node_modules/datapm/lib/util.js:70:13
    at Array.forEach (native)
    at Object.exports.createResourceEntries (/Users/honeybadger/.nvm/versions/node/v0.12.5/lib/node_modules/datapm/lib/util.js:69:13)
    at DPM.init (/Users/honeybadger/.nvm/versions/node/v0.12.5/lib/node_modules/datapm/lib/dpm.js:61:8)
    at DPM.run (/Users/honeybadger/.nvm/versions/node/v0.12.5/lib/node_modules/datapm/lib/dpm.js:28:10)
    at Object.<anonymous> (/Users/honeybadger/.nvm/versions/node/v0.12.5/lib/node_modules/datapm/bin/dpm.js:27:7)
    at Module._compile (module.js:460:26)
    at Object.Module._extensions..js (module.js:478:10)

I made sure I had "csv" module installed globally, and yep, still errored out!

Searching the ckan protocol rejects yields CkanApiError

http://dpm.readthedocs.org/en/latest/manual.html#obtaining-a-package

dpm search ckan:// iso --debug
DEBUG:dpm.index:dpm: CKAN config: {'base_location': 'http://thedatahub.org/api', 'api_key': ''}
Error: Got redirected to another URL, which does not work with POSTS. Redirection: http://thedatahub.org/api/search/package -> http://datahub.io/api/search/package

[** For (lots) more information run with --debug **]
DEBUG:dpm.cli:Exception:
Traceback (most recent call last):
  File "/Library/Python/2.7/site-packages/dpm-0.10-py2.7.egg/dpm/cli/base.py", line 172, in main
    self.run(options, args)
  File "/Library/Python/2.7/site-packages/dpm-0.10-py2.7.egg/dpm/cli/standard.py", line 90, in run
    for pkg in index.search(query):
  File "/Library/Python/2.7/site-packages/dpm-0.10-py2.7.egg/dpm/index/ckan.py", line 51, in search
    for pkg_name in self.ckan.package_search(query)['results']:
  File "/Library/Python/2.7/site-packages/ckanclient-0.10-py2.7.egg/ckanclient/__init__.py", line 417, in package_search
    self.open_url(url, data, headers)
  File "/Library/Python/2.7/site-packages/ckanclient-0.10-py2.7.egg/ckanclient/__init__.py", line 233, in open_url
    raise CkanApiError(self.last_message)
CkanApiError: Got redirected to another URL, which does not work with POSTS. Redirection: http://thedatahub.org/api/search/package -> http://datahub.io/api/search/package

CKAN support

  • Push a Data Package (plus resource data) to CKAN (data goes in datastore)
  • Pull from CKAN - download a dataset (plus resources) as a Data Package

Data Validation Support

Validate (tabular) data against schema using https://github.com/okfn/json-table-schema-validator

  • Support validating all files or just one selected file from resources - just columns and types
    • Load from CSV
    • Pass to JSON Table Schema Validator
  • Support for constraints
  • Support pointing at a random data file (not just one already in datapackage.json)

Help about validation error

Hi,
I have the error below but I do not understand the meaning.

Does the "name" field contain wrong character?

{
"valid": false,
"errors": [
{
"message": "String does not match pattern: ^([a-z0-9._-])+$",
"code": 202,
"dataPath": "/name",
"schemaPath": "/properties/name/pattern",
"subErrors": null,
"type": "schema"
}
]
}

Thank you very much

Impossible to use dpm install

Hi,
I'm using Windows and I would like to create a data API for Data Packages, as I read here (GREAT): http://okfnlabs.org/blog/2014/09/11/data-api-for-data-packages-with-dpm-and-ckan.html

I started with:
dpm install https://raw.githubusercontent.com/SiciliaHub/albopretoriopa/master/datapackage.json

I have

TypeError: Object #<Object> has no method 'parseSpecString'
    at Dpm.get (C:\Program Files\Bitnami Node.js Stack\nodejs\node_modules\datapackage\lib\index.js:149:21)
    at C:\Program Files\Bitnami Node.js Stack\nodejs\node_modules\datapackage\node_modules\async\lib\async.js:227:13
    at C:\Program Files\Bitnami Node.js Stack\nodejs\node_modules\datapackage\node_modules\async\lib\async.js:111:13
    at Array.forEach (native)
    at _each (C:\Program Files\Bitnami Node.js Stack\nodejs\node_modules\datapackage\node_modules\async\lib\async.js:32:
24)
    at async.each (C:\Program Files\Bitnami Node.js Stack\nodejs\node_modules\datapackage\node_modules\async\lib\async.j
s:110:9)
    at _asyncMap (C:\Program Files\Bitnami Node.js Stack\nodejs\node_modules\datapackage\node_modules\async\lib\async.js
:226:9)
    at Object.map (C:\Program Files\Bitnami Node.js Stack\nodejs\node_modules\datapackage\node_modules\async\lib\async.j
s:204:23)
    at Dpm.install (C:\Program Files\Bitnami Node.js Stack\nodejs\node_modules\datapackage\lib\index.js:138:9)
    at next (C:\Program Files\Bitnami Node.js Stack\nodejs\node_modules\datapackage\lib\index.js:124:10)

I do not know how to overcome this barrier.

I have installed dpm 0.7.2

Thank you

dpm ckan error

Hi,
I'm using 0.7.3 version in windows.

I have create .dpmrc in my home folder and run:
dpm ckan datahub --owner_org=aborruso

This is my config file:
[ckan.datahub]
url = http://datahub.io/
apikey = myapi

And I have:
var config = ckanConfigs[ckanUrlOrName] || this.config.ckan[ckanUrlOrName];
^
TypeError: Cannot read property 'datahub' of undefined
at Dpm.ckan (C:\Program Files\Bitnami Node.js Stack\nodejs\node_modules\datapackage\lib\index.js:230:62)
at Object. (C:\Program Files\Bitnami Node.js Stack\nodejs\node_modules\datapackage\bin\dpm:80:7)
at Module._compile (module.js:456:26)
at Object.Module._extensions..js (module.js:474:10)
at Module.load (module.js:356:32)
at Function.Module._load (module.js:312:12)
at Function.Module.runMain (module.js:497:10)
at startup (node.js:119:16)
at node.js:906:3

It seems dpm does not read my config file. Am I wrong?

Thank you

stream support

stream command and method to stream a resource from a datapackage

Note we already have this in datapacakge.js and just need to merge across

Add better documentation on using DPM programmatically

DPM itself bundles up datapacage-init and datapackage-read and other individual modules. If a developer wants to use these modules in an application is it advised to use dpm or rather those modules individually?

If the former is ideal, perhaps we should have better documentation on the README about this. If the later, perhaps noting that as well :-)

Merge get and install (?)

Based on discussions may make sense to merge get and install (and possibly clone too) and replace them with flags to install (note we may still want install to have "aliases" e.g. download/import/get/clone/... but they would be aliases or close to pure aliases (i.e.g aliases with flags))

Add option to install from file:/// URLs

Add an option that allows to reference a datapackage.json file stored locally, e.g. dpm install file:///path/to/folder/datapackage.json and we could assume that when no protocol is specified, it is a local path so that we can have dpm install /path/to/folder/.

Node module name

Given that we plan to converge datapackage.js into here I was wondering if we should call the core module "datapackage" with logic being that dpm is just the command line tool. (It would also resolve issue in npm registry that dpm is taken and we are using datapm atm).

Furthermore, if we one day split out again we would have i imagine datapackage-init, datapackage-view etc

So options for core lib name:

  • DPM - current (note could lowercase)
  • datapackage

@sballesteros this is minor but nice to have it settled.

Datapackage name differences

The package name in the tree output at the end of running dpm install and the directory created to hold the downloaded datapackage may not match if the name in the datapackage.json is not the same as that returned by okfn/datapackage-identifier's parse function (which uses the URL to work out the name).

e.g.

curl http://example.com/foo/datapackage.json
{
  "name": "bar",
  ...
}

Using dpm install on this URL will put the files in datapackages/foo but the tree output at the end of the run will show datapackages/bar.

semver

Hi,
when I install via "npm install datapackage -g" I have a dependency not solved: semver.

If I run npm install semver -g, all is ok.

My test is on a Windows machine.

best regards

Running "dpm" with no args errors out

This is a trivial UX improvement, but it really nice how running npm with no args lists the commands that --help does. I think it would be nice if running dpm does the same instead of the current which is

~ $ dpm
dpm ERR! invalid command

Installing has many warnings. Some about deprecation and security

> npm install datapackage -g
npm WARN deprecated [email protected]: This package is discontinued. Use lodash.iteratee@^4.0.0.
npm WARN deprecated [email protected]: use uuid module instead
npm WARN deprecated [email protected]: ReDoS vulnerability parsing Set-Cookie https://nodesecurity.io/advisories/130
npm WARN deprecated [email protected]: Please update to minimatch 3.0.2 or higher to avoid a RegExp DoS issue
npm WARN deprecated [email protected]: Please use https://www.npmjs.com/package/jsontableschema instead
npm WARN deprecated [email protected]: Please update to minimatch 3.0.2 or higher to avoid a RegExp DoS issue
npm WARN deprecated [email protected]: graceful-fs v3.0.0 and before will fail on node releases >= v7.0. Please update to graceful-fs@^4.0.0 as soon as possible. Use 'npm ls graceful-fs' to find it in the tree.
npm WARN deprecated [email protected]: ReDoS vulnerability parsing Set-Cookie https://nodesecurity.io/advisories/130
npm WARN engine [email protected]: wanted: {"node":">=6"} (current: {"node":"4.3.1","npm":"1.4.21"})
/
npm WARN deprecated [email protected]: This package is discontinued. Use lodash.iteratee@^4.0.0.
npm WARN deprecated [email protected]: use uuid module instead
npm WARN deprecated [email protected]: ReDoS vulnerability parsing Set-Cookie https://nodesecurity.io/advisories/130
npm WARN deprecated [email protected]: Please update to minimatch 3.0.2 or higher to avoid a RegExp DoS issue
npm WARN deprecated [email protected]: Please use https://www.npmjs.com/package/jsontableschema instead
npm WARN deprecated [email protected]: Please update to minimatch 3.0.2 or higher to avoid a RegExp DoS issue
npm WARN deprecated [email protected]: graceful-fs v3.0.0 and before will fail on node releases >= v7.0. Please update to graceful-fs@^4.0.0 as soon as possible. Use 'npm ls graceful-fs' to find it in the tree.
npm WARN deprecated [email protected]: ReDoS vulnerability parsing Set-Cookie https://nodesecurity.io/advisories/130
npm WARN engine [email protected]: wanted: {"node":">=6"} (current: {"node":"4.3.1","npm":"1.4.21"})

I would be most concerned about the ReDoS warnings and the deprecattions

Roadmap v0.1

Distilled from discussions at http://pad.okfn.org/p/labs-data-packages

Plan

We will converge on https://github.com/okfn/dpm (merge existing dpm2 into here ...)

Stage 1:

  • merge dpm2 with this repo - #9
  • dpm init - #4
  • dpm install {name or url} # just from url or git url ... - #3
  • support for datapackage.json style requirements in a normal project

Stage 2

  • web view
  • web create / init online
  • dpm view

Stage 3

  • registry - #5
  • dpm publish - #2
  • dpm install supports registry

Stage 4

Other changes / decisions

  • Reuse npm conceptually but probably not literally
  • datapackage.json rather than package.json
    • This is different so name should be different
    • What we have now
    • Different makes it easier to identify e.g. on github
    • we are not directly reusing npm so not a problem there (cf component.io stuff)
  • Semi-deprecate: datapackage.js
  • Copy over everything from dpm2 into dpm
  • Maybe split out later to:
    • dpm-init
    • dpm-install
    • ...

What We Want - Overview

Commands (in rough order of priority)

  • init (create)
    • (??) adddata (add a data file to the data package)
  • install
  • install with deps
  • view
  • publish [depends on registry]
  • validate [validate a data file against schema]

Note we want both command line and web so provide library commands that support web stuff (but dpm will focus on the command line tool)

Registry

To do nice install "medium-term" we want a registry but not an immediate (new few weeks) priority. Registry / catalog needs:

  • post (publish)
  • get (install)
  • ownership

dpm install fails with proxy

dpm version 0.7.7, node 4.0.0

https_proxy=http://proxy.example.com:80 dpm install https://raw.githubusercontent.com/datasets/s-and-p-500-companies/master/datapackage.json
/usr/local/lib/node_modules/datapackage/lib/index.js:166
    if(err) return cb(err);
                   ^

ReferenceError: cb is not defined
    at Request._callback (/usr/local/lib/node_modules/datapackage/lib/index.js:166:20)
    at self.callback (/usr/local/lib/node_modules/datapackage/node_modules/request/request.js:123:22)
    at emitOne (events.js:77:13)
    at Request.emit (events.js:169:7)
    at ClientRequest.self.clientErrorHandler (/usr/local/lib/node_modules/datapackage/node_modules/request/request.js:232:10)
    at emitOne (events.js:77:13)
    at ClientRequest.emit (events.js:169:7)
    at TLSSocket.socketErrorListener (_http_client.js:259:9)
    at emitOne (events.js:77:13)
    at TLSSocket.emit (events.js:169:7)
    at emitErrorNT (net.js:1250:8)
    at doNTCallback2 (node.js:429:9)
    at process._tickCallback (node.js:343:17)

Recommended Node/NPM installation method?

Any suggestions about which of the many ways to install Node & NPM?
For example, using the Node/NPM packages in Debian or Ubuntu repos, using a PPA, or compiling from source.
Similarly, are there any special considerations when installing DPM or is "npm install dpm" ok?

Proposed Revisions

@danfowler requested that I get more involved here. I caution him to be more careful what he asks for, in future.

This is a short list of items that have come up as I worked through needs at dataship. Some involve changes for consistency, some are extensions.

sources

For consistency rename web to url. This is about using a consistent property name for URLs.
Same thing likely applies to author and contributor, but I haven't done anything with those yet.

homepage (string) -> web (object)

Change the homepage string to an object with a name property and a url property. Also rename it from homepage to web. This came up in the context of hosting datapackages for media companies (specifically five thirty eight). This is used to link back to a page published on their site that highlights the data. Likely to come up a lot when serving this segment.

license -> licenses

This needed to be an array for us.

created

This new field is an ISO 8601 string describing when the datapackage was created, in UTC.
2014-04-14T15:21:00.000Z

Happy to discuss any of these further.

[command] info - print out info on a DataPackage

Command to display info on a given DataPackage:

dpm info /path/to/datapackage/
dpm info gold-prices
dpm info http://data.okfn.org/data/core/gold-prices
dpm info https://github.com/datasets/gold-prices

Input:

  • path
  • url
  • github
  • single name (core dataset)

Output:

  • DP.json dump - print out datapackage.json (prettily)
  • Nice ASCII - human readable ascii version (to stdout)
  • HTML version (plus boot a web browser to show it to you) - basically what you get at http://data.okfn.org/tools/view

suggested aliases: show

Double listing of Procedural Wings

One of them calls itself procedural wings and the other procedural dynamics in the error message you get when trying to install one over the other. "The following inconsistecies were found:
ProceduralDynamics conflicts with ProceduralWings." They look identic in the mod listing.

where can i find datasets?

It would be nice to add an example of where datasets could be/where they are and how I could get them. Do I use the path to /dataprotocols.json ?

init / create command

As a User I want to create a datapackage.json

  • I want sensible defaults and assistance
    • generate defaults - look stuff up on disk etc
    • prompt based on defaults etc
  • I want data files automatically analysed
  • (?) I want data files automatically discovered and added (scan / globbing https://npmjs.org/package/glob)
  • I want to be able to easily update later
  • I want good help as to what these fields mean
  • I want to be able to generate from data files as urls

Interaction

Would be nice to have quite nice abstracted interaction / generation that would work both for command line and e.g. web. Would also be nice for interaction to be conditional on previous actions - e.g. prompting for filepath(s), urls of data files to add.

Here's a think-through of case for adding a data file from a url

  • ask you for a url
  • analyse the url
  • prompt you to ocnfirm analysis
  • get response
  • update datapackage.json
  • prompt again ...

Implementation

Possibly reuse https://npmjs.org/package/init-package-json as per https://github.com/maxogden/datapackage-json

Searching for data packages

Where does the search command pull data from? I thought it was using datahub.io but a search for plos or the old example search iso do not return any results.

adddata (addresource) command

Add a data resource to an existing data package ...

This is about adding a data file inside this data package not about installing a dependency (which is something max raises in thread below -- and is useful but a separate issue)

Bigquery support

More specifically, given a data package convert resource schemas to bigquery form and/or upload all my data to bigquery for me ...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.