Giter VIP home page Giter VIP logo

ckanr's Introduction

ckanr

Project Status: Active – The project has reached a stable, usable state and is being actively developed. R-check rstudio mirror downloads cran version Coverage

ckanr is an R client for the CKAN API.

Description

CKAN is an open source set of tools for hosting and providing data on the web. (CKAN users could include non-profits, museums, local city/county governments, etc.).

ckanr allows users to interact with those CKAN websites to create, modify, and manage datasets, as well as search and download pre-existing data, and then to proceed using in R for data analysis (stats/plotting/etc.). It is meant to be as general as possible, allowing you to work with any CKAN instance.

Get started: https://docs.ropensci.org/ckanr/

Installation

Stable CRAN version

install.packages("ckanr")

Development version

install.packages("remotes")
remotes::install_github("ropensci/ckanr")
library('ckanr')

Note: the default base CKAN URL is set to https://data.ontario.ca/ Functions requiring write permissions in CKAN additionally require a privileged CKAN API key. You can change this using ckanr_setup(), or change the URL using the url parameter in each function call. To set one or both, run:

ckanr_setup() # restores default CKAN url to https://data.ontario.ca/
ckanr_setup(url = "https://data.ontario.ca/")
ckanr_setup(url = "https://data.ontario.ca/", key = "my-ckan-api-key")

ckanr package API

There are a suite of CKAN things (package, resource, etc.) that each have a set of functions in this package. The functions for each CKAN thing have an S3 class that is returned from most functions, and can be passed to most other functions (this also facilitates piping). The following is a list of the function groups for certain CKAN things, with the prefix for the functions that work with that thing, and the name of the S3 class:

  • Packages (aka packages) - package_*() - ckan_package
  • Resources - resource_*() - ckan_resource
  • Related - related_*() - ckan_related
  • Users - user_*() - ckan_user
  • Groups - group_*() - ckan_group
  • Tags - tag_*() - ckan_tag
  • Organizations - organization_*() - ckan_organization
  • Groups - group_*() - ckan_group
  • Users - user_*() - ckan_user
  • Related items - related_*() - ckan_related

The S3 class objects all look very similar; for example:

<CKAN Resource> 8abc92ad-7379-4fb8-bba0-549f38a26ddb
  Name: Data From Digital Portal
  Description:
  Creator/Modified: 2015-08-18T19:20:59.732601 / 2015-08-18T19:20:59.657943
  Size:
  Format: CSV

All classes state the type of object, have the ID to the right of the type, then have a varying set of key-value fields deemed important. This printed object is just a summary of an R list, so you can index to specific values (e.g., result$description). If you feel there are important fields left out of these printed summaries, let us know.

note: Many examples are given in brief for readme brevity

Contributors

(alphabetical)

  • Florian Mayer
  • Francisco Alves
  • Imanuel Costigan
  • Scott Chamberlain
  • Sharla Gelfand
  • Wush Wu

Meta

  • Please report any issues or bugs.
  • License: MIT
  • Get citation information for ckanr in R doing citation(package = 'ckanr')
  • Please note that this package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

ckanr's People

Contributors

dickoa avatar fjuniorr avatar florianm avatar galalh avatar hannaboe avatar imanuelcostigan avatar jeroen avatar katieroserice avatar ltla avatar mattfullerton avatar nicholsn avatar nn-at avatar patlittle avatar sckott avatar sharlagelfand avatar wush978 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ckanr's Issues

Any package API changes?

@florianm @wush978 @imanuelcostigan

I would like to get the first version on CRAN soon so we can get it in front of more eyes, to get more use cases, feedback on the pkg. Do you notice anything that should be changed in the package API before we do this? Ideally I'd rather not change anything major after getting on CRAN so as not to break people's code.

organization related features

Dear maintainers,

I am a co-funder of a start-up which provides tutorial services for data science. We choose the CKAN as a management system for the example datasets and we choose R to be our major toolset in our product. Thanks for your great work so that we can integrate our tutorial with CKAN more easily.

There are some use case of us that required API which ckanr does not implement. For example, the organization related features, the decode of unicode code print, and the authentication of API-Key.

I am also a package developer of R, so it is easy for me to customize the ckanr for us. However, I want to discuss with you that if I could contribute these features for you. For example, the feature of organization related features.

Thanks,
Wush

dplyr interface of `ds_search_sql`

Dear @sckott ,

I think it is possible to implement a subset of dplyr interface for ckanr. I guess we need to use the engine in dplyr to generate the SQL statement and submit the statement to ckan server via ds_search_sql. Does this feature fit this package?

Use of POST-Requests instead of GET

Some functions (e.g. package_list & package_search) use POST-Requests (ckan_POST) all-tough they would only need ckan_GET (which doesn't seem to be used at all in ckanr).

This isn't a problem for most CKAN-instances. However, Zurich's Open Data Portal blocks all POST request from out of the cities internal network. Using only POST-Request makes it therefore impossible to use ckanr with this data portal

Test against other CKAN versions

Testing now against I thin latest CKAN version

Test also against at least two other versions. Could simply have additional test files in test suite for other CKAN versions at different installations, or perhaps some other setup

What values using for testing?

@florianm what values were you using for testing? for the url, did, and rid? I can get a key or use one i have (i'll figure that out) - I need to set these env vars on Travis as well.

The url is changed during testing

In as.ckan_group, the url will be changed during testing.

I am testing the new API and I set a test_url in ckanr_setup. However, after calling group_list, the new S3 api requires some information from group_show without passing the url to group_show. Therefore, group_show uses the get_default_url.

Ability to change default base URL

This could for example be implemented using a session option:

set_ckan_url <- function (url = "http://data.techno-science.ca") option(ckan.default.url = url)
get_ckan_url <- function () getOption("ckan.default.url")

And then pass this output of this function to other ckanr functions such that ckanr function interfaces would be modified for e.g. as follows:

package_list(offset = 0, limit = 31, url = get_ckan_url(), as = "list", ...)

I'll submit a pull request.

Which CKAN API version does ckanR implement?

Apologies for this silly question - are we writing against 2.4a, latest (and keep upgrading), last stable? I'm testing locally on my own branch based on CKAN latest (2.4a) with a few custom plugins (possible problem: custom data schema using ckanext-scheming), and Travic CI is configured to use my public-facing instance http://data-demo-dpaw.wa.gov.au/ (at least for writing tests).

Also, how do I label this issue as question?

Download files

Files can be lots of things (e.g., csv, xml, jpg, etc.) so this needs to cover various cases

Looks as though there's no proper way in CKAN API to download a file, just have to GET() it i guess

Error in resource_create

I was able to run all resource_*() functions with the exception of resource_create(). I did the following:

resource_create(package_id= package_id,
                rcurl=url,
                description="newdata",
                name="newdataset",
                as="table")

It turns an error "Error in file.exists(path) : invalid 'file' argument" no matter what rcurl value I set (local csv path, cloud csv path, existing resource url).

`resource_show` and `package_show` parameters different to CKAN API 2.4a's

Both resource_show CKAN 2.4a API guide and package_show will take parameters:

  • id (resource ID),
  • include_tracking = FALSE,
  • package_show only: use_default_schema (bool) – use default package schema instead of a custom schema defined with an IDatasetForm plugin (default: False),

while the API key is not required.

The function docstring is out of date and throws two errors:

* checking Rd \usage sections ... WARNING
Undocumented arguments in documentation object 'package_show'
  ‘key’
Undocumented arguments in documentation object 'resource_show'
  ‘key’

Could I suggest to align ckanR's resource_show and package_show parameters and docstrings to CKAN's API spec? (see also PR #35 )

Fix ds_search

  • examples are giving NULL back
  • convert to GET instead of POST probably

S3 classes for packages, resources, any other objects

Before we send 1st version to CRAN, perhaps it makes sense to think about whether it makes sense to create lite-weight S3 classes to wrap up the data given back representing packages and resources. And any other common objects that CKAN API returns.

e.g., a S3 class for packages could look like this when printed:

<package> 27778230-2e90-4818-9f00-bbf778c8fa09
  name: foo-bar
  revision_timestamp: 2014-10-28T18:13:22.213530 
  description: Data dictionary for CSTMC artifact datasets
  size: 20 mb
  format: XLS

Those things printed could be easily modified based on consensus opinion here.

In addition to how packages and resource objects are printed to the screen, functions that perform operations on packages/resources could accept an object of class e.g., ckan_pkg or ckan_res, respectively, in addition to accepting a package or resource ID - This makes it more brainless (easier) to do simultaneous operations on packages or resources.

err_handler produces error

Dear @sckott ,

The err_handler in 7d279d2#diff-e549505eb95036528ca3b125f62915a6R29 produces Error in content(x)$error : object of type 'externalptr' is not subsettable when I want to access a local ckan server here.

The reason is that the returned 404 http response contains a webpage which is parsed as HTMLInternalDocument from httr::content(7d279d2#diff-e549505eb95036528ca3b125f62915a6R31) and the subsetting $error produces this error.

I save the response object here: https://www.dropbox.com/s/06rqjjo13cwo0tz/404.Rds?dl=0.

Wush

tag_show tests fail after CKAN API change

The CKAN's tag_show as of early June 2015 defaults to omit datasets from the result. ckanr's tests assume include_datasets=TRUE.

Suggestion: include explicit parameter include_datasets=FALSE in tag_show.R and adjust tests to use include_datasets=TRUE.

missing methods

I think we need db_query_fields and db_query_rows methods - and possibly others, but that's where I got errors trying out examples. - haven't been able to figure those out yet, getting errors on fetch()/dbFetch() - it's not clear why it's not working -

and I wonder if we should be developing against dev version of dplyr or stable - there's some changes there - forcing dev dplyr for now, can easily switch back

cc @wush978

Implement resource_update

I have an R object, e.g. a data.frame (to become a CSV table), a ggplot2 object (to become a PDF figure), a text string (to become a TXT file), and a write-permitted CKAN API key for a CKAN instance.

I want to upload each object to replace an existing CKAN resource, with or without first writing the objects to files.

This should work both locally and when used e.g. inside an RShiny running on an RShiny server.

This is the desired functionality in Python (my own code):

# /usr/bin/python
import os, sys, requests, json
from datetime import datetime
env.use_ssh_config = True

DC = "http://my-ckan-instance.org/"

def resource_show(resource_id, api_url="{0}api/3/action/".format(DC)):
    """Return a JSON dictionary of dataset details for a given dataset id.

    :param dataset_id: A CKAN dataset id
    :param api_url: A live CKAN API URL, default: "<DC>/api/3/action/"

    :return: a JSON dictionary of dataset details
    """
    r = requests.get("{0}resource_show?id={1}".format(api_url, resource_id))
    if r.status_code == 200:
      return json.loads(r.content)["result"]
    else:
      return None 

def resource_update(filedir, res_id, filepath, 
  api_url="{0}api/3/action/".format(DC), api_key=None):
  """Update the file attachment of a given resource ID.

  :param res_id: The resource ID of an existing resource
  :param filepath: The path to a local file to be uploaded
  :param api_url: A live CKAN API URL, default: "<DC>/api/3/action/"
  :param api_key: A write-privileged API key (sensitive data)

  :return: None
  """
  res = resource_show(res_id)
  res["state"] = "active"
  res["last_modified"] = datetime.now().strftime("%Y-%m-%d %H:%M:%S")

  if os.path.isfile(os.path.join(filedir, filepath)):
    r = requests.post("{0}resource_update".format(api_url),
              #data={"id": res_id},
              data=res,
              headers={"Authorization": api_key},
              files=[('upload', file(os.path.join(filedir, filepath)))])
    print("Uploaded {0}".format(filepath))
  else:
    print("File {0} not found, skipping upload".format(filepath))

API-Key

Dear maintainers,

The ckan platform allow authentication and authorization with API-Key in http request.

I noticed that we could append the API-Key through add_headers with key X-CKAN-API-Key to authenticate with ckan server. The main problem to me is a secure way to let the user input their API-Key.

I propose two way to let the users input their API-Key.

The first way is let them set the option properly in .Rprofile or somewhere. That is to say, we could read the options from getOption. In such way, the user does not need to input their API-Key.

Another way is to receive the input through svDialogs::dlgInput where the input will not be log into the .history file.

How do you think?

Thanks,
Wush

Different behavior of travis-ci

Dear @sckott ,

I notice that the result of travis-ci of wush978/ckanr is different from travis-ci of ropensci/ckanr. Mine always stops at the beginning due to: "sudo: must be setuid root". Could you tell me whether you did any additional configuration in travis-ci?

Thanks,
Wush

resource description

Is there a way to define resource description with ds_create_dataset? Thank you

default parameter of resource_search

Dear @sckott ,

The default value of parameter q of resource_search does not make sense. First, q must be given because there is no default value given in the definition of resource_search. Second, *:* does not work with resource_search.

The q="name:data" works so I will use it for testing.

user_list does not work

Dear @sckott ,

The user_list does not work properly with default value:

> a <- user_list()
 Hide Traceback

 Rerun with Debug
 Error: length(names(body)) > 0 is not TRUE 
8 stop(sprintf(ngettext(length(r), "%s is not TRUE", "%s are not all TRUE"), 
    ch), call. = FALSE, domain = NA) 
7 stopifnot(length(names(body)) > 0) 
6 body_config(body, encode) 
5 perform(handle, writer, method, opts, body) 
4 make_request("post", hu$handle, hu$url, config, body_config(body, 
    encode)) 
3 POST(file.path(url, ck(), method), body = body, ...) at zzz.R#23
2 ckan_POST(url, "user_list", body = body, ...) at user_list.R#14
1 user_list() 

I guess the reason is that both q and order_by are NULL.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.