Giter VIP home page Giter VIP logo

Comments (14)

dweinberger avatar dweinberger commented on May 29, 2024

If we're going for simple (+1), might we consider Schema.org? https://schema.org/Dataset

from harvardopendata.github.io.

rebeccawilliams avatar rebeccawilliams commented on May 29, 2024

For reference @dweinberger, here is how schema.org maps to DCAT: https://project-open-data.cio.gov/v1.1/metadata-resources/#field-mappings

from harvardopendata.github.io.

nsinai avatar nsinai commented on May 29, 2024

Thanks @rebeccawilliams !

from harvardopendata.github.io.

nsinai avatar nsinai commented on May 29, 2024

Proposed MVP of Schema:
Title
Description
Tags
Last Update
Publisher
Contact Name
Contact Email
Access URL
Download URL
License

from harvardopendata.github.io.

dweinberger avatar dweinberger commented on May 29, 2024

This seems like a reasonable list, but are we already in agreement that we want to adopt an existing schema rather than create our own, even if our own is just that simple?

from harvardopendata.github.io.

hathix avatar hathix commented on May 29, 2024

The schema Nick is proposing is a pared-down version of the Data.gov schema, so you could say we're creating our own schema based off existing schema. I feel it's a good idea to keep the schema this way until we need domain-specific extensions.

from harvardopendata.github.io.

dweinberger avatar dweinberger commented on May 29, 2024

I personally think it's important to state from the gitgo that we are not planning on creating our own schema, unless we have a compelling reason to do so. One way to flag that would be to say that we are in fact using data.gov's schema, even though we're not using all of its terms/fields/vocabulary.

If down the road it turns out that data.gov doesn't have all the fields we need, we can find one that does or extend data.gov.

But I do like saying as part of the MVP launch that we have adopted an existing standard. Sends the right signal, doesn't it?

from harvardopendata.github.io.

hathix avatar hathix commented on May 29, 2024

So we'd say we're using a subset of the existing Data.gov standard, which we might extend later as needed?

from harvardopendata.github.io.

dweinberger avatar dweinberger commented on May 29, 2024

Yes, although I think it'd be slightly better just to say that we're using the data.gov standard. That doesn't imply that we're using every available field.

And I'd leave out the "which we might extend later" part because that's always assumed. And the point of this is to signal a reluctance to create new standards and enthusiasm about using existing standards.

So, I'd say something like, "We are using the Data.gov schema to describe the data sets DROID is referencing." Perfectly true. Not misleading in the least. Clear. Excellent signal.

from harvardopendata.github.io.

bsapozhnikov avatar bsapozhnikov commented on May 29, 2024

Hi I've just pushed a proposed schema based on the Data.gov schema to the master branch - feel free to take a look and would love any feedback you have!

from harvardopendata.github.io.

philipashlock avatar philipashlock commented on May 29, 2024

It's probably best not to call the schema used by Data.gov, the "Data.gov standard" since it's really an international standard (called DCAT) used by a lot of other countries and data catalogs too. You can also read at the bottom of http://schema.org/Dataset that their schema is also based on DCAT. We have a particular JSON serialization of DCAT with a few additional fields which we typically refer to as the Project Open Data Metadata schema, but even that is not exclusive to Data.gov since it's used by local governments and incorporated directly into platforms like Socrata.

As for the the current schema.md in this repo, it looks like there are a number of things out of sync with DCAT, e.g. "tags" instead of "keyword" and "updated" instead of "modified" etc

Feel free to copy our schema.md file and make use of our JSON Schema files

from harvardopendata.github.io.

nsinai avatar nsinai commented on May 29, 2024

Thanks @philipashlock!

@bsapozhnikov -- can you update?

from harvardopendata.github.io.

nsinai avatar nsinai commented on May 29, 2024

@mcrosas asked in an email:

"Thanks, for sharing this. What's the intention of choosing this schema? For clarification, the Dataverse supports an extensive set of metadata fields (including the fields in this data.gov schema), which map to metadata standards such as Dublin Core Terms and DataCite Schema, needed to implement best practices in data sharing and publishing."

The short answer is that this currently a student project, with faculty support and mentoring. Similar to open data portals by governments, the idea is to catalog interesting data sets that anyone can find and use. The idea isn't to host any data, but simply be an accessible and useful catalog.

from harvardopendata.github.io.

mercecrosas avatar mercecrosas commented on May 29, 2024

@nsinai the problem with this approach is that data are not guaranteed to be accessible and reusable if the catalog doesn't point to trusted archival data repositories that provide long-term access to the data. Making data open and accessible does not only require a catalog with metadata to search and learn what the dataset is about, but also requires long-term access to a data in a reusable format (which is what a repository like Dataverse would provide if the actually datasets for this project were hosted and archived in the repository).

from harvardopendata.github.io.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.