Giter VIP home page Giter VIP logo

Comments (6)

akariv avatar akariv commented on June 19, 2024

Re your questions:

  • I switched the order of resources.
  • geojson is not a tabular data format as far as I know, and we should probably treat them as binary files (i.e. copy as is from source to package) and not treat them as tabular sources (which are converted to csv and regular json).

@rufuspollock - perhaps we need a third category here, for files that we want to keep both in original form and also extract the data out - although I'm not sure what exactly that process would look like for geojson.

from assembler.

anuveyatsu avatar anuveyatsu commented on June 19, 2024

This order switch solves the problem with the first resource only. Indexes for other resources are i*2, e.g., resource with original index 1 now equals to 2 etc. The solution would be to have all json versions in the end:

// Original:
resources: [csv1, csv2]

// Transformed:
resources: [csv1, csv2, json1, json2]

On the other hand, we could require publishers to always use resource name (now it is either name or index) to reference a resource so we don't need to care about indexes.

from assembler.

rufuspollock avatar rufuspollock commented on June 19, 2024

@akariv we should not be adding the JSON version of resources to the resource list IMO. The derived files should:

  • EITHER: be kept separate from the datapackage.json
  • OR: be added in a separate section

This is something that probably needs a bit of thought and my inclination would be first option (not included) and we just use a convention to locate them for now. (Amongst other things these are not separate resources taht should should have a separate rendering in the frontend but just a conversion of a given resource to a different format).

Aside: I think we may want _datahub as path rather than .datahub as directory name. What do people think.

@akariv i guess this raises some interesting questions re pipelines and our setup here. In pipelines datapackage.json is being used as the manifest - so to add the json version involves adding a new resource. However, in terms of datapackage.json I don't think we want these derived files to show up as new resources. I think we probably need to think this through in some way asap.

from assembler.

akariv avatar akariv commented on June 19, 2024

@rufuspollock I don't really follow.
Why would we want to use something else than the datapackage, and resort to a convention to locate files instead of using the standard?
Why wouldn't we want to provide these extra files (different formats, validation results etc) as part of the package?

from assembler.

rufuspollock avatar rufuspollock commented on June 19, 2024

@akariv because the derived stuff is derived. From the presentation PoV these are not "real" resources - but simply different formatting of the original resource. There are different ways to look at this

  • As a Publisher I want my consumers to get the data in my original data package by default (not all the derived data) so that they can work with it without lots of extraneous info (and in standard way)
  • As a User I want to clearly distinguish the data package from derived resources so that I can choose what I get and in particular get the original data package easily
    • As a User viewing a data package I want to see the actual resources in a data package (perhaps along with there derived versions) but without all the other "derived" data files so that I have a clear view on the data in this package
      • NB: it is important that i can associate a specific derived file with its underlying resource (e.g. so that I can present the JSON version option next to the CSV version in the interface)
  • As an Admin I want to know the amount of space being used by a given package so that I can report this to the user (and bill based on this)

There are lots of way we could think about implementing this - probably worth a chat.

from assembler.

rufuspollock avatar rufuspollock commented on June 19, 2024

WONTFIX / INVALID. This is now no longer relevant since we switched to "extended" datapackage.json in the pkgstore.

from assembler.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.