rufuspollock-okfn / dataexplorer Goto Github PK

View Code? Open in Web Editor NEW

148.0 148.0 47.0 4.19 MB

View, visualize, clean and process data in the browser.

Home Page: http://explorer.okfnlabs.org

CSS 11.35% JavaScript 80.78% HTML 7.86%

dataexplorer's People

Contributors

Stargazers

Watchers

dataexplorer's Issues

Script Editor

Part of #35 (Scripts & Scripting)

Use CodeMirror here. Some connection with script execution (#46) as part of same UI?

Some nice extras:

Help / tutorial (??)
Autocomplete - http://codemirror.net/2/demo/complete.html

URLs for individual projects (at e.g. project/{id})

Also: rename dataset view to project view.

[super] Scripts & Scripting including Editor, Storage and Execution

Currently have "transformations". Let's turn this into full-on scripting in the form of full JS.

Would still be nice if you could do simple map / reduce (rather than having to wrap this laboriously yourself ...)
Maybe need some concepts of types of transform ... cf https://github.com/okfn/dataexplorer/blob/master/doc/design.markdown

Implementation

Model stuff: scripts on projects etc - #44
script editor - #45
Script execution - #46
- sandboxed (in an iframe or webworker)
- live ...
  - security considerations ...
Integrate into UI - (cf #43)

Data viewer based on multiview and includes map and slickgrid grid

Switch to using full Recline multiview so we have full viewer including:

map
slickgrid grid
search / filter

Show load view or past projects on startup

Don't have login as default screen

Transformations uses CodeMirror

Have nice code editing using codemirror. Even better would be to have a Run button to try out the code.

Auto-Save script to local storage

Auto save script (after every key stroke, every 30s, after every run?) to local storage so that if browser crashes or you close window you can restore later.

Create a Project

As a User I want to create a Project and associate files to that project (online or local) so that I can reload that project later automatically

A project has:

ID, title, description, keywords
Resources
** Data Files
** Data APIs
Scripts
Apps/Views

Notes:

Do we really need multiple files?

Write scripts

As a User I want to write scripts for a project so that I can re-run those scripts later and thereby recreate the results (e.g. a specific visualization)

Specify type metadata

As a User I want to create type information about an object

Export data

As a User I want to export my data to an online service such as the DataHub

Notes:

I want my API key to be easily retrieved in a secure way
I want to have my login details remembered for next time so I don't have to re-add them ...
I want export to happen reasonably quickly and progress to be shown (bulk export)
I want upserts to happen when needed when object with that ID already exists
I want the connection of this data file with a given online store to be remembered so I can easily repeat this upload later

Share with Others

As a User I want to Share my project with others so that they can see what I have done

Support configuring files to load from url

e.g. ?backend=csv&url=....

Better UX

Run on all records notifies of success
Save shows spinner while waiting to complete

change github oauth client settings

this link: https://github.com/login/oauth/authorize?client_id=2bab62e2f6b27c3ebe1f&scope=repo,%20user

redirects to /src/transformer/?code=58296c98c7388a0a5cfb

but it should redirect to ?code=58296c98c7388a0a5cfb

i think you have to change the github app settings for 2bab62e2f6b27c3ebe1f

Get save working again to github

Script Execution

Execute scripts (part of #35). Strong connection with script editor #45

run in sandbox
- show output
Run fully (what do you get to change?)

Context for script editor:

_ / lodash
$ (impossible for webworkers)
dataset

"Transformations" tab does not show until reloaded

Workflow: (google chrome)

Create a project, go on transform, be unhappy click: "My Projects" create a new one, click "Transform" -> Transform does not show, the list view stays.

Solution reload and then select the project from my projects

Ability to save "views" - i.e. specific state of a view such as a graph, map or even grid

Need to save:

Query state
View state

Already done this a bunch of times for recline so should not be too hard - if you are thinking of working on this check out the State related stuff in Recline!

Load / save projects (to local storage)

Suggest using:

https://github.com/jeromegn/Backbone.localStorage

Running transform doesn't give indication it does something

Just playing with data converter from a google spreadsheet. (https://docs.google.com/spreadsheet/ccc?key=0AlgwwPNEvkP7dGxsWFhoeWljWV9BNHVMbFRVRHQyZXc#gid=0) I would like to convert the tags to lower case. Therefore I run the following transform:

function(doc) {
doc['tags'] = doc['tags'].toLowerCase();
return doc;
}

If I click "Run on all records" it doesn't give me an indication the running is finished, can we have this?

Use google picker to load from google docs

Auto-adjust heights / widths of grids, graphs etc on project view page based on screen size

github integration reports success on failure

Improve id generation and saving

Prepend dataexplorer when saving to localStorage (and don't include it in id itself)
Use meaningful names and just append '-{integer)' to avoid duplication ...

Persisting per-user Data Explorer config (incl e.g. list of projects)

For time being will just be the list of projects.

{
projects: [{
  id: ...
  gist_id: ... # maybe the same
  state: active | deleted
}
  ....
]
}

Persistence to special gist

Name: DataExplorerConfig.json

Boot sequence:

if not logged in: END
(if logged in) get all gists: http://developer.github.com/v3/gists/#list-gists
search for DataExplorerConfig.json
- if it does not exist, we create local model DataExplorerConfig and have empty list of projects
- if does load data and initialize DataExplorerConfig with it

Persistence is automatic on each change ...

Rename projects view to dashboard

Introduce notifications in the UI

Loading
Success / fail (e.g. of transformations)

Load CSVs from places other than github

[super] Project objects

Project objects encapsulating a given activity around a dataset

Dataset (or source thereof)
Scripts
Views (graphs, maps etc with config)
Export destinations

Running "transform" results in all entries the same

Steps to reproduce (chrome)

load the spreadsheet from: https://docs.google.com/spreadsheet/ccc?key=0AlgwwPNEvkP7dGxsWFhoeWljWV9BNHVMbFRVRHQyZXc#gid=0
go to transform
run
javascript function(doc) { doc['tags'] = doc['tags'].toLowerCase(); return doc; }
switch back to grid view: All the entries are the same

Import/Export and Data workflow

This is a overview how user usually works with data (see attached diagram). There exists lots of formats and data services, therefore a modular architecture is needed to achieve most flexibility, that would result in most useful user experiences.

Data can be generally either serialized into file and stored somewhere, or accessed using APIs.

System components

Importers/Exporters
- Backends - transfer format is dictated by API, needs credential management
- Remote File - probably some proxy needed for cross origin, needs optional credential management
- Local file - File API, Drag and Drop
- Clipboard - using textareas or clipboard libraries
Service detector
- For remote services, most comfortable is just to specify URL and system should try to guess service by URL, e.g. if it is a GDocs, CKAN dataset, etc. Then prompt for more details only if necessary.
Format detector for deserialization
- Similar to service detector but for formats
Deserializers
- for each format, e.g. csv, json, xml
- provide auto-detection with reasonable defaults, ask user only if necessary
Serializers
- text based, e.g. csv, json, xml, (xls?)
- image based - canvas, svg, export to bitmap, pdf

Formats

text based - csv, json, xml, HTML tables
binary - xls, ods
maps, graphs - images - png, jpg, svg, pdf

Backends (as in Recline.js)

ckan
couchdb
csv
dataproxy
elasticsearch
gdocs
memory
solr

CSV uses memory backend, it is not a logically backend, just a format. Therefore having a file/document backend with a given format would be more flexible.

Auto-detection

User should be bothered by need to provide additional input as little as possible. Reasonable defaults or auto-detection should be utilized. For exporting some live preview of part of data should be available.

Backends need data from user to specify credentials or format options. That data can be viewed simply as JSON object. Some general form building library like Alpaca (based on general JSON-Schema) or Backbone-Forms can be utilized. This solution have advantage that adding a new format or backend does not require to write UI related code.

Operation stack

Instead of just exporting and saving static data, it would be comfortable to provide option to share application state (stack of applied operations, queries, transformations, visualization optins, etc.) via URL. This encourages easy sharing and also if data are corrected in original source, all derived data would appear also corrected.

recline issues:

dataexplorer issues:

#18 Configurable save

Next steps

discussion
sketches of screens for DataExplorer using above architecture
propose class structure for additional Recline.js functionality

SlickGrid grid intermittent error "Uncaught Error: Cannot find stylesheet."

Have been intermittently getting an error where SlickGrid does not display and have in console: "Uncaught Error: Cannot find stylesheet." (slick.grid.min.js:39)

[super] Relayout main project view (primary data explorer with grid etc)

This is the primary work area. Want to get the layout right. Key principles:

Minimize clutter
Key components - this relates to the design document
- Grid (just a special view??)
- Views (graphs, maps etc etc)
- Summary (Readme) - #78
- Scripts
- Info (info about the project and raw representation of content?)

Where do we display things e.g. grid and script editor together?

Very incomplete Sketch - Google Drawing

Chrome  v23.0.1271.97 m

About / Intro / Splash page

Nice page giving a quick intro and overview of what is on offer

Save and load scripts

We should be able to save and load clean up scripts from github

Save of scripts should be to a gist by default (later we can add choice of save location)
- If we loaded the script we should remember that (localStorage or a cookie) and then save back to there
Load scripts - specify location similar to specification of data location

Scripts in Model

Part of #35 (scripts & scripting)

Implementation

Should look like a gist pretty much :-)

{
  # aka name (but unique)
  id: ...
  # the content of the script 
  content: 
  language: javascript
}

Possible for the future

  # e.g. transform, standard ...
  type: ...
  # for remote scripts (i.e. ones you import and reuse)
  url:

Switch header to bootstrap etc

Load data from GDocs

Configurable save

Save by default to source from which we loaded
- Requires that backend is writable - not so for CSV from disk and online CSV (??)
Could just allow this to be configurable - so you can choose from github or ...

Read support is almost trivial (get this straight from recline)
Write support - now this is interesting and I (@rgrp) have thought about this a lot - see below for summary

Write Support to GDocs in JS

Google now use OAuth. This is normally a PITA to support (witness the hassle to get login to github via oauth) but Google specifically support client side stuff:

The Google OAuth 2.0 Authorization Server supports JavaScript applications (JavaScript running in a browser). Like the other scenarios, this one begins by redirecting a browser (popup, or full-page if needed) to a Google URL with a set of query string parameters that indicate the type of Google API access the application requires. Google handles the user authentication, session selection, and user consent. The result is an access token. The client should then validate the token. After validation, the client includes the access token in a Google API request. 1

To find out we need the Google Docs on OAuth for Client Side Apps

[super] Data Cleaning Examples

General Thoughts

Many useful examples require ability to load multiple datasets => we must be able to load remote data as part of the scripting.
- More strongly: does a focus on a single dataset in a project make sense? Refine does that but ...
Geocoding also requires external access

Scripting Library

For scripting to be really useful we need some standard functions

plot(dataset, config, name) - #69
loadData(urlOrConfig, ...).done(function(dataObject) {}) - could we just use bits of recline atm? - #74
geocode - #68
saveDataset
direct xhr ... - #66

External access

=> We need an ajax library - see #66

Use Cases

Cleaning

Population from World Bank: https://github.com/datasets/population/blob/master/scripts/process.py
Geocoding - #28 (but which dataset ...)

Merging / Transforming

Deflating (taking out inflation)
- UK home prices - https://github.com/datasets/house-prices-uk
- stock market, ...
Per-capitizing

Miscellaenous

Doing sums ... (how useful ...)
Binning (pivot tables ...)

rufuspollock-okfn / dataexplorer Goto Github PK

dataexplorer's People

Contributors

Stargazers

Watchers

Forkers

dataexplorer's Issues

Implementation

Create a Project

Write scripts

Specify type metadata

Export data

Share with Others

Persistence to special gist

System components

Formats

Backends (as in Recline.js)

Auto-detection

Operation stack

Related

Next steps

Implementation

Write Support to GDocs in JS

Links

General Thoughts

Scripting Library

External access

Use Cases

Cleaning

Merging / Transforming

Miscellaenous

Recommend Projects

Recommend Topics

Recommend Org