Comments (9)
Hi,
I think it would help if you could elaborate a bit more on the data you want to process (which KB? which version? from where?).
Turtle (ttl) makes use of prefixes, which means that specific subsets are not always well-defined, if taken out of context.
So it is not straightforward to convert smaller ttl files into nt files, in general.
Regards,
Philipp
from wikidata-core-for-qa.
Sorry for my carelessness. The data I want to process is a RDF dump of wikidata dated January 23, 2023 in ttl format. It could be found on this website: https://dumps.wikimedia.org/wikidatawiki/entities/20230123/. (Although there is a nt file of the same content as the ttl file on this website, my storage space is not enough for the nt file. So I just can use the ttl file. )
from wikidata-core-for-qa.
No worries!
So I understand the problem, but do not know of an immediate work-around, unfortunately.
Maybe you could try to recreate smaller ntriple files from the ttl file.
The pruning/cleaning process involves splitting the ntriples file anyway, so you could then skip this step.
Another option, depending on your specific use-case, would be to use a slightly older dump.
We provide already filtered dumps for download here: https://github.com/PhilippChr/wikidata-core-for-QA#Downloads.
from wikidata-core-for-qa.
Thank you for your reply. It is really helpful. I will have a try.
from wikidata-core-for-qa.
Hi, thanks for publishing this valuable tool.
My question is similar to @TurquoiseDM
I would like to know if it is possible to do this through the following process:
- split a big TTL to some small ttl
- translate them to nt file
- use your code to deal with them one by one?
For the ttl to nt file conversion, it seems that I just need to copy the prefix to each ttl file and make sure not to split until all descriptions of an entity are done. (From my observations)
Another question is, does this method of splitting and then processing one by one conflict with your code?
from wikidata-core-for-qa.
Hi,
yes this sounds like a reasonable approach!
The prefix is indeed important.
In the code, the large ntriples file is also split into several smaller ones, that are processed in parallel. We follow a naming convention for these files, so you would need to rename your ntriple files to fit the following format:
wikidata-core-for-QA/filter_wikidata.py
Line 17 in 556093d
The line to split the files in the bash script could then be dropped:
Other than that, I do not see a problem right now.
Regards,
Philipp
from wikidata-core-for-qa.
You may also want to check out our public API hosted at https://clocq.mpi-inf.mpg.de,
that hosts a Wikidata dump from 2022, and provides convenient and QA-specific access to KB functionalities.
from wikidata-core-for-qa.
Thanks for your detailed reply, it is really helpful. Happy Valentine's Day!
from wikidata-core-for-qa.
Thank you, same for you! :)
from wikidata-core-for-qa.
Related Issues (2)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wikidata-core-for-qa.