Giter VIP home page Giter VIP logo

dbpedia / dataid Goto Github PK

View Code? Open in Web Editor NEW
13.0 13.0 4.0 42.87 MB

The DBpedia Data ID Unit is a DBpedia Group with the goal of describing LOD datasets via RDF files, to host and deliver these metadata files together with the dataset in a uniform way, create and validate such files and deploy the results for the DBpedia and its local chapters.

HTML 26.70% JavaScript 50.70% Python 0.31% CSS 0.91% PHP 14.97% Java 6.33% Makefile 0.09%

dataid's Introduction

As we switched to Mercurial, there are several other repositories....
this is the default repository and it contains Miscellaneous things like the logo.
The Extraction Framework for example is in the "extraction_framework" repository.
Please look here: http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/

dataid's People

Contributors

chile12 avatar cirola2000 avatar der-bruemmer avatar jimkont avatar kurzum avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dataid's Issues

add versioning to model

datid:version property that links to dataset resource for specific version
datid:latestVersion property that shows last version
a version is a dataset that is not a subset of the main dataset

Support for enumerations of datasets

In many cases it might be useful if it was possible to state that a certain dataset is comprised a set of other datasets.

Consider DBpedia:
There are several files under http://downloads.dbpedia.org/2014/en, each of can be considered as a distribution of a dataset that corresponds to its own content.

Now we may want to state that the dataset that corresponds to the data in the official DBpedia endpoint is comprised of a set of datasets that correspond to the individual files that were loaded into the endpoint.

If I understood correctly, currently this would be accomplished by using void:subset.
However, this would not allow to state multiple enumerations of datasets that make up a specific larger dataset.

Instead, I propose to introduce a DatasetEnumeration:
Such an enumeration explicitly states that a certain targetDataset is exactly comprised of a set of given source datasets (i.e. the triples referred to by the merge of the source dataset must be equal to that of the target dataset).

This would also allow stating multiple enumerations for a single dataset, such as a mixture of official enumerations and custom ones (such as pre-merged files (e.g. types, geo, properties + labels) of DBpedia hosted on a third party server).

Also, metadata can then be attached to enumerations themselves, such as who created them when.

:foo
    a dataid:DatasetEnumeration ;
    dcterms:modified "someDate"^^xsd:dateTime ;
    dcterms:contributer :user1235 ;
    skos:theme :official ; // Tag the enumeration as an official one so we can filter by that tag
    rdfs:comment "Datasets included in the official DBpedia"@en ;
    dataid:targetDataset :official-dbpedia-sparql-endpoint ;
    dataid:sourceDatasets (
        :dbpedia-types-2014,
        :dbpedia-mapped-properties-2014,
        :dbpedia-labels-en-2014,
        ...
    ) # Don't actually use blank nodes - it is just written like this for convenience.
    .

add linksets to model

add void:linkset to show links between datasets, the number of links and possibly the type of their relation

dct:accrualPeriodicity: swapped domain & range

Seems to me the domain and range of this property are swapped:

dct:accrualPeriodicity
    rdfs:domain dct:Frequency ;
    rdfs:range dataid:Dataset .

Also, in the examples you use http://purl.org/linked-data/sdmx/2009/code#freq-W meaning Weekly.
In http://vocab.getty.edu/doc/#Descriptive_Properties I've used:

  • voag:frequencyOfChange and dct:accrualPeriodicity. frequencyOfChange is more accurate for our purpose: accrualPeriodicity means how often issues are added to a journal or papers are added to an archive; while frequencyOfChange means how often a dataset is refreshed
  • value voag:BiWeekly because that's how often Getty updates. VOAG provides more periodicity values than SDMX.

BTW in http://vocab.getty.edu/doc/#Descriptive_Information I do a "mash up" of several descriptive ontologies, not unlike DataID. I did this in Mar 2014. I like your well-defined approach to such "mash up" better, and have recorded a future Getty task to use DataID. But perhaps you can borrow some ideas from my "mash up"...

How to describe data using DataID available from relational database connections, e.g. JDBC

Dear DataID-team,

related to the question mentioned in w3c/dxwg#1240 I´d like to raise the question to the developers of DataID how to describe a dataset, which is stored in a relational database and published through a JDBC connection.

A reference to an entire database, is of interested, e.g. through a JDBC string, e.g "jdbc:oracle:thin:@​hostname:1521:my-database or jdbc:hive2://hostname:8443/my-database) or a table in the database.

It seems that DataID extends DCAT in this sense, it would be helpful to see an example, potentially also in the documentation, to serialise this.

Best

Georg

Language variant not visible

The language variant of datasets is not visible in the dataid except for the language tag in the URI,
This makes it hard to distinguish sets on application side.

Fix Agent roles and property

-keep using contactPoint as property between dataset and agent
-add "associatedAgent" property
-add agent roles: contact point, maintainer, publisher, creator, contributor..

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.