Comments (7)
no worries, I'll reopen this issue and just change the name
from gtfs-bench.
Hi @Lars-H!
Happy to see that you are using the benchmark! :-). I'll answer you in detail all the questions.
Materializing the virtual KG as RDF using rdfizer leads to non-absolute IRIs in the RDF.
You're right. The problem here is that real data (the actual GTFS feed) that comes from Madrid Metro provides correctly the URL but the data generator does not support this kind of data. That should be already fixed by VIG generator, with our configuration. We will see what is happening.
The remaining data seems to be valid RDF. However, the datatype of the values for the properties arrivalTime and departureTime is specified as xsd:duration while the values are not valid durations (under D-entailment).
Yes! I thought I removed all the datatypes duration, as again, VIG generator does not support them. I'll clean and fix the mappings. Please use the ones from this official GitHub repo (not the ones from kgc-eval which could be not up to date)
The constructed data seems to be quite redundant. At scale 100, there more than 5 million different ShapePoints with the exact same latitude and longitude. (Also, there are only 960 distinct values for latitude and 1000 distinct values for longitude)
This is again a problem with the generator that we rely on. In any case, I'll try to take a look at their code to see if it can be solved (my suspicion here is that they may have the random generator not working very randomly).
In any case, there would be nice work to be done on improving the data generator of the benchmark using SHACL constraints
from gtfs-bench.
Hi @dachafra,
thanks for the quick reply and clarification 🙂 I'll try using the up-to-date mappings from this repo.
Looking forward to future improvements on the benchmark.
Best regards
Lars
from gtfs-bench.
Hi @Lars-H,
Would you mind to open specific issues for each question? So I can solve and track all of them!
from gtfs-bench.
Sure, I can do that. I'll make sure to re-run the process with the updated mappings and see which issues remain. Which mappings file should I use to materialize the RDF from a MySQL DB using rdfizer
?
from gtfs-bench.
It should be automatically output from the docker I guess. If not, you can use R2RML and Morph-KGC or Ontop instead of the rdfizer https://github.com/oeg-upm/gtfs-bench/blob/master/mappings/gtfs-rdb.r2rml.ttl
from gtfs-bench.
Ok, that worked. The only issue I am seeing now is the mentioned xsd:duration
datatype. Should I report it in a separate issue?
from gtfs-bench.
Related Issues (20)
- MySQL "LOCAL INFILE" import HOT 3
- Docker "--pull always" option HOT 3
- Improve output data compression HOT 8
- Mysql 8.0: Incorrect DATE value
- Fix shape_dist / shape_dist_traveled inconsistency
- url fixed columns are not mantained in the scaling-up with VIG HOT 2
- exact_times in CSV is 0 while for RDB it is NULL HOT 7
- Include fixed jar from VIG HOT 1
- shape_dist_traveled not found in CSV HOT 3
- Table names in mysql mappings wrong HOT 3
- Mappings producing different number of results HOT 10
- Enable passing parameters via env vars or a config file HOT 4
- gtfs:zone is an object property in the ontology but data property in the mappings HOT 2
- Service-Calendar and Shape-shapePoints are joins without conditions HOT 1
- Queries with booleans in the triple patters do not produce result is ontop
- gtfs:distanceTraveled datatype
- Include PostgreSQL and Oracle schema SQL files when generating
- Change YARRRML translator to yatter HOT 4
- Include in the ontology all properties and classes
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gtfs-bench.