dbpedia / dataid Goto Github PK

The DBpedia Data ID Unit is a DBpedia Group with the goal of describing LOD datasets via RDF files, to host and deliver these metadata files together with the dataset in a uniform way, create and validate such files and deploy the results for the DBpedia and its local chapters.

HTML 26.70% JavaScript 50.70% Python 0.31% CSS 0.91% PHP 14.97% Java 6.33% Makefile 0.09%

dataid's Introduction

As we switched to Mercurial, there are several other repositories....
this is the default repository and it contains Miscellaneous things like the logo.
The Extraction Framework for example is in the "extraction_framework" repository.
Please look here: http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/

dataid's People

Contributors

Stargazers

Watchers

Forkers

aklakan tr3vr mingdarui afcarl

dataid's Issues

add versioning to model

datid:version property that links to dataset resource for specific version
datid:latestVersion property that shows last version
a version is a dataset that is not a subset of the main dataset

Figure out "completeness" of distributions

-When is a number of dataset distributions the complete dataset?
-How can you check or formulate this?

Reduce number of types of dataset

handle "dataid:Dataset, dcat:Dataset, void:Dataset, prov:Entity" in ontology instead of each resource

Integrate with semantic site map / Generate semantic site maps

-Generate the dataset description part of semantic site map from dataid
-find out the coverage of semantic site map data by dataid

Support for enumerations of datasets

In many cases it might be useful if it was possible to state that a certain dataset is comprised a set of other datasets.

Consider DBpedia:
There are several files under http://downloads.dbpedia.org/2014/en, each of can be considered as a distribution of a dataset that corresponds to its own content.

Now we may want to state that the dataset that corresponds to the data in the official DBpedia endpoint is comprised of a set of datasets that correspond to the individual files that were loaded into the endpoint.

If I understood correctly, currently this would be accomplished by using void:subset.
However, this would not allow to state multiple enumerations of datasets that make up a specific larger dataset.

Instead, I propose to introduce a DatasetEnumeration:
Such an enumeration explicitly states that a certain targetDataset is exactly comprised of a set of given source datasets (i.e. the triples referred to by the merge of the source dataset must be equal to that of the target dataset).

This would also allow stating multiple enumerations for a single dataset, such as a mixture of official enumerations and custom ones (such as pre-merged files (e.g. types, geo, properties + labels) of DBpedia hosted on a third party server).

Also, metadata can then be attached to enumerations themselves, such as who created them when.

:foo
    a dataid:DatasetEnumeration ;
    dcterms:modified "someDate"^^xsd:dateTime ;
    dcterms:contributer :user1235 ;
    skos:theme :official ; // Tag the enumeration as an official one so we can filter by that tag
    rdfs:comment "Datasets included in the official DBpedia"@en ;
    dataid:targetDataset :official-dbpedia-sparql-endpoint ;
    dataid:sourceDatasets (
        :dbpedia-types-2014,
        :dbpedia-mapped-properties-2014,
        :dbpedia-labels-en-2014,
        ...
    ) # Don't actually use blank nodes - it is just written like this for convenience.
    .

Provide upload to datahub menu in generator

-API key (string), organization (string), datahub login name -> push ttl as http post
-URI not yet certain

Add property for graph of the dataset in a SPARQL endpoint

could be void

Tags and literal types

make sure they are coherent and existent

add linksets to model

add void:linkset to show links between datasets, the number of links and possibly the type of their relation

Have a dataid triple store, save DataIDs in it

-enable dataset queries
-enable crawling
-dynamic lod cloud, statistics

dct:accrualPeriodicity: swapped domain & range

Seems to me the domain and range of this property are swapped:

dct:accrualPeriodicity
    rdfs:domain dct:Frequency ;
    rdfs:range dataid:Dataset .

Also, in the examples you use http://purl.org/linked-data/sdmx/2009/code#freq-W meaning Weekly.
In http://vocab.getty.edu/doc/#Descriptive_Properties I've used:

voag:frequencyOfChange and dct:accrualPeriodicity. frequencyOfChange is more accurate for our purpose: accrualPeriodicity means how often issues are added to a journal or papers are added to an archive; while frequencyOfChange means how often a dataset is refreshed
value voag:BiWeekly because that's how often Getty updates. VOAG provides more periodicity values than SDMX.

BTW in http://vocab.getty.edu/doc/#Descriptive_Information I do a "mash up" of several descriptive ontologies, not unlike DataID. I did this in Mar 2014. I like your well-defined approach to such "mash up" better, and have recorded a future Getty task to use DataID. But perhaps you can borrow some ideas from my "mash up"...

Create a landing page

Create a landing page with some documentation

Add accessURL for sparqlEndpoints

add granular licensing model

Example: ODRL (http://www.w3.org/community/odrl/two/vocab/)

Define a best practice for a DataID's URI

Where is the dataid for any dataset located?
i.e. http://dbpedia.org with mime application/rdf+xml would return a dataid

Provide an endpoint that aggregates dataids

-Aggregate DataIDs in a SPARQL-Endpoint to allow for search, relations, browsing

parentDataID property

add property for void:datasetdescriptions to link to parent dataids

Update ontology and dbpedia id file

Service description should be more granular

What about RESTful services, APIs etc? More Service types needed.

Create Debian packages out of dataid

see LOD2 project

How to describe data using DataID available from relational database connections, e.g. JDBC

Dear DataID-team,

related to the question mentioned in w3c/dxwg#1240 I´d like to raise the question to the developers of DataID how to describe a dataset, which is stored in a relational database and published through a JDBC connection.

A reference to an entire database, is of interested, e.g. through a JDBC string, e.g "jdbc:oracle:thin:@hostname:1521:my-database or jdbc:hive2://hostname:8443/my-database) or a table in the database.

It seems that DataID extends DCAT in this sense, it would be helpful to see an example, potentially also in the documentation, to serialise this.

Best

Georg

many additional (or less) triples
many new links
datasource not found......

create a barebone data repository

-Datacat, check with claus

isDistributionOf

dcat:distribution^-1 Property