Giter VIP home page Giter VIP logo

dataexplorer's People

Contributors

aliounedia avatar andylolz avatar coderaiser avatar djw avatar fyears avatar michael avatar mk270 avatar roll avatar rufuspollock avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dataexplorer's Issues

[super] Scripts & Scripting including Editor, Storage and Execution

Currently have "transformations". Let's turn this into full-on scripting in the form of full JS.

Implementation

  • Model stuff: scripts on projects etc - #44
  • script editor - #45
  • Script execution - #46
    • sandboxed (in an iframe or webworker)
    • live ...
      • security considerations ...
  • Integrate into UI - (cf #43)

Auto-Save script to local storage

Auto save script (after every key stroke, every 30s, after every run?) to local storage so that if browser crashes or you close window you can restore later.

Braindump

Create a Project

As a User I want to create a Project and associate files to that project (online or local) so that I can reload that project later automatically

A project has:

  • ID, title, description, keywords
  • Resources
    ** Data Files
    ** Data APIs
  • Scripts
  • Apps/Views

Notes:

  • Do we really need multiple files?

Write scripts

As a User I want to write scripts for a project so that I can re-run those scripts later and thereby recreate the results (e.g. a specific visualization)

Specify type metadata

As a User I want to create type information about an object

Export data

As a User I want to export my data to an online service such as the DataHub

Notes:

  • I want my API key to be easily retrieved in a secure way
  • I want to have my login details remembered for next time so I don't have to re-add them ...
  • I want export to happen reasonably quickly and progress to be shown (bulk export)
  • I want upserts to happen when needed when object with that ID already exists
  • I want the connection of this data file with a given online store to be remembered so I can easily repeat this upload later

Share with Others

As a User I want to Share my project with others so that they can see what I have done

Better UX

  • Run on all records notifies of success
  • Save shows spinner while waiting to complete

Script Execution

Execute scripts (part of #35). Strong connection with script editor #45

  • run in sandbox
    • show output
  • Run fully (what do you get to change?)

Context for script editor:

  • _ / lodash
  • $ (impossible for webworkers)
  • dataset

"Transformations" tab does not show until reloaded

Workflow: (google chrome)

Create a project, go on transform, be unhappy click: "My Projects" create a new one, click "Transform" -> Transform does not show, the list view stays.

Solution reload and then select the project from my projects

Improve id generation and saving

  • Prepend dataexplorer when saving to localStorage (and don't include it in id itself)
  • Use meaningful names and just append '-{integer)' to avoid duplication ...

Persisting per-user Data Explorer config (incl e.g. list of projects)

For time being will just be the list of projects.

{
projects: [{
  id: ...
  gist_id: ... # maybe the same
  state: active | deleted
}
  ....
]
}

Persistence to special gist

Name: DataExplorerConfig.json

Boot sequence:

  • if not logged in: END
  • (if logged in) get all gists: http://developer.github.com/v3/gists/#list-gists
  • search for DataExplorerConfig.json
    • if it does not exist, we create local model DataExplorerConfig and have empty list of projects
    • if does load data and initialize DataExplorerConfig with it

Persistence is automatic on each change ...

[super] Project objects

Project objects encapsulating a given activity around a dataset

  • Dataset (or source thereof)
  • Scripts
  • Views (graphs, maps etc with config)
  • Export destinations

Import/Export and Data workflow

This is a overview how user usually works with data (see attached diagram). There exists lots of formats and data services, therefore a modular architecture is needed to achieve most flexibility, that would result in most useful user experiences.

Data can be generally either serialized into file and stored somewhere, or accessed using APIs.

data-workflow-diagram.png

System components

  • Importers/Exporters
    • Backends - transfer format is dictated by API, needs credential management
    • Remote File - probably some proxy needed for cross origin, needs optional credential management
    • Local file - File API, Drag and Drop
    • Clipboard - using textareas or clipboard libraries
  • Service detector
    • For remote services, most comfortable is just to specify URL and system should try to guess service by URL, e.g. if it is a GDocs, CKAN dataset, etc. Then prompt for more details only if necessary.
  • Format detector for deserialization
    • Similar to service detector but for formats
  • Deserializers
    • for each format, e.g. csv, json, xml
    • provide auto-detection with reasonable defaults, ask user only if necessary
  • Serializers
    • text based, e.g. csv, json, xml, (xls?)
    • image based - canvas, svg, export to bitmap, pdf

Formats

  • text based - csv, json, xml, HTML tables
  • binary - xls, ods
  • maps, graphs - images - png, jpg, svg, pdf

Backends (as in Recline.js)

  • ckan
  • couchdb
  • csv
  • dataproxy
  • elasticsearch
  • gdocs
  • memory
  • solr

CSV uses memory backend, it is not a logically backend, just a format. Therefore having a file/document backend with a given format would be more flexible.

Auto-detection

User should be bothered by need to provide additional input as little as possible. Reasonable defaults or auto-detection should be utilized. For exporting some live preview of part of data should be available.

Backends need data from user to specify credentials or format options. That data can be viewed simply as JSON object. Some general form building library like Alpaca (based on general JSON-Schema) or Backbone-Forms can be utilized. This solution have advantage that adding a new format or backend does not require to write UI related code.

Operation stack

Instead of just exporting and saving static data, it would be comfortable to provide option to share application state (stack of applied operations, queries, transformations, visualization optins, etc.) via URL. This encourages easy sharing and also if data are corrected in original source, all derived data would appear also corrected.

Related

recline issues:

dataexplorer issues:

Next steps

  • discussion
  • sketches of screens for DataExplorer using above architecture
  • propose class structure for additional Recline.js functionality

Save and load scripts

We should be able to save and load clean up scripts from github

  • Save of scripts should be to a gist by default (later we can add choice of save location)
    • If we loaded the script we should remember that (localStorage or a cookie) and then save back to there
  • Load scripts - specify location similar to specification of data location

Scripts in Model

Part of #35 (scripts & scripting)

Implementation

Should look like a gist pretty much :-)

{
  # aka name (but unique)
  id: ...
  # the content of the script 
  content: 
  language: javascript
}

Possible for the future

  # e.g. transform, standard ...
  type: ...
  # for remote scripts (i.e. ones you import and reuse)
  url: 

Functional tests

Getting to the point where development will become unsustainable without tests ...

Undo support (?)

Doubt this is needed but worth recording anyway.

Not needed because you could just reload the source data and re-run the script ...

Configurable save

  • Save by default to source from which we loaded
    • Requires that backend is writable - not so for CSV from disk and online CSV (??)
  • Could just allow this to be configurable - so you can choose from github or ...

Support gdocs as backend

This would be awesome with gdocs as a backend!

  • Read support is almost trivial (get this straight from recline)
  • Write support - now this is interesting and I (@rgrp) have thought about this a lot - see below for summary

Write Support to GDocs in JS

Google now use OAuth. This is normally a PITA to support (witness the hassle to get login to github via oauth) but Google specifically support client side stuff:

The Google OAuth 2.0 Authorization Server supports JavaScript applications (JavaScript running in a browser). Like the other scenarios, this one begins by redirecting a browser (popup, or full-page if needed) to a Google URL with a set of query string parameters that indicate the type of Google API access the application requires. Google handles the user authentication, session selection, and user consent. The result is an access token. The client should then validate the token. After validation, the client includes the access token in a Google API request. 1

To find out we need the Google Docs on OAuth for Client Side Apps

Links

[super] Data Cleaning Examples

General Thoughts

  • Many useful examples require ability to load multiple datasets => we must be able to load remote data as part of the scripting.
    • More strongly: does a focus on a single dataset in a project make sense? Refine does that but ...
  • Geocoding also requires external access

Scripting Library

For scripting to be really useful we need some standard functions

  • plot(dataset, config, name) - #69
  • loadData(urlOrConfig, ...).done(function(dataObject) {}) - could we just use bits of recline atm? - #74
  • geocode - #68
  • saveDataset
  • direct xhr ... - #66

External access

=> We need an ajax library - see #66

Use Cases

Cleaning

Merging / Transforming

Miscellaenous

  • Doing sums ... (how useful ...)
  • Binning (pivot tables ...)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.