Comments (6)
Re your questions:
- I switched the order of resources.
geojson
is not a tabular data format as far as I know, and we should probably treat them as binary files (i.e. copy as is from source to package) and not treat them as tabular sources (which are converted to csv and regular json).
@rufuspollock - perhaps we need a third category here, for files that we want to keep both in original form and also extract the data out - although I'm not sure what exactly that process would look like for geojson.
from assembler.
This order switch solves the problem with the first resource only. Indexes for other resources are i*2
, e.g., resource with original index 1
now equals to 2
etc. The solution would be to have all json
versions in the end:
// Original:
resources: [csv1, csv2]
// Transformed:
resources: [csv1, csv2, json1, json2]
On the other hand, we could require publishers to always use resource name (now it is either name or index) to reference a resource so we don't need to care about indexes.
from assembler.
@akariv we should not be adding the JSON version of resources to the resource list IMO. The derived files should:
- EITHER: be kept separate from the datapackage.json
- OR: be added in a separate section
This is something that probably needs a bit of thought and my inclination would be first option (not included) and we just use a convention to locate them for now. (Amongst other things these are not separate resources taht should should have a separate rendering in the frontend but just a conversion of a given resource to a different format).
Aside: I think we may want _datahub as path rather than .datahub as directory name. What do people think.
@akariv i guess this raises some interesting questions re pipelines and our setup here. In pipelines datapackage.json is being used as the manifest - so to add the json version involves adding a new resource. However, in terms of datapackage.json I don't think we want these derived files to show up as new resources. I think we probably need to think this through in some way asap.
from assembler.
@rufuspollock I don't really follow.
Why would we want to use something else than the datapackage, and resort to a convention to locate files instead of using the standard?
Why wouldn't we want to provide these extra files (different formats, validation results etc) as part of the package?
from assembler.
@akariv because the derived stuff is derived. From the presentation PoV these are not "real" resources - but simply different formatting of the original resource. There are different ways to look at this
- As a Publisher I want my consumers to get the data in my original data package by default (not all the derived data) so that they can work with it without lots of extraneous info (and in standard way)
- As a User I want to clearly distinguish the data package from derived resources so that I can choose what I get and in particular get the original data package easily
- As a User viewing a data package I want to see the actual resources in a data package (perhaps along with there derived versions) but without all the other "derived" data files so that I have a clear view on the data in this package
- NB: it is important that i can associate a specific derived file with its underlying resource (e.g. so that I can present the JSON version option next to the CSV version in the interface)
- As a User viewing a data package I want to see the actual resources in a data package (perhaps along with there derived versions) but without all the other "derived" data files so that I have a clear view on the data in this package
- As an Admin I want to know the amount of space being used by a given package so that I can report this to the user (and bill based on this)
There are lots of way we could think about implementing this - probably worth a chat.
from assembler.
WONTFIX / INVALID. This is now no longer relevant since we switched to "extended" datapackage.json in the pkgstore.
from assembler.
Related Issues (20)
- Rows are resorted in strange way in some cases HOT 2
- Unable to run tests
- BBTest for private dataset HOT 1
- Boot up specstore service for Black Box tests HOT 1
- New pipeline with goodtables validation HOT 1
- Small improvements for assembler and derived resources HOT 1
- Data Validation pipeline always returns valid report HOT 1
- BB tests for specstore HOT 1
- Datapackage contains Invalid zipped file with 'csv' extension, when there is zipped source. HOT 1
- report for dataset may be overwritten by report for another dataset
- datapackage.json in zip file differs from version on github or pkgstore.datahub.io HOT 1
- Duplicated views HOT 1
- Fix the build HOT 1
- Can not process fields with yearmonth and geopoint HOT 1
- Processing excel files with type integer fails validation HOT 4
- Include hash for derived/report
- Fix failing tests due to changes in specstore HOT 1
- Processor for extracting README for zip
- Processor to update metadata for source HOT 1
- Tests for Purge
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from assembler.