Comments (8)
Thanks for the feedback. We are considering to refactor this code using another JSON library for parsing (which might also make processing faster). However, the code will not work for Special:EntityData since this returns another kind of JSON. The JsonConverter is for the internal JSON that is only found in dump files; this is different from the JSON you get through the API.
from wikidata-toolkit.
Hmm, this is weird and I think the dumps should also use the external JSON format. However, are you planning to support the external JSON format, too, or would it help you if I created a new component for that purpose?
from wikidata-toolkit.
We plan to have a new component for the external format, since we will need this for interpreting API result. The external format is much less messy and more uniform (and it is even documented somewhere, which is not the case for the internal JSON). The internal JSON format is what is actually stored in the database; even if it would change now, the old revisions would remain the same. This is why we have to support several versions of the format -- for eternity ...
It has been discussed if all the internal JSON should be rewritten (even in old revisions) to be the new format, but this has not happened yet.
We already have support for serializing to the external JSON format. In principle, one could start with the parsing code from scratch, without the burden of the other JSON parser that has to cover all those special cases. We are thinking about using fasterxml.jackson for the new implementation, which leads to much nicer code (see also the discussion at #47).
from wikidata-toolkit.
On 01.06.2014 20:05, Markus Krötzsch wrote:
We plan to have a new component for the external format, since we will need this for interpreting API result. The external format is much less messy and more uniform (and it is even documented somewhere, which is not the case for the internal JSON). The internal JSON format is what is actually stored in the database; even if it would change now, the old revisions would remain the same. This is why we have to support several versions of the format -- for eternity ...
It has been discussed if all the internal JSON should be rewritten (even in old revisions) to be the new format, but this has not happened yet.
If we could agree with the people responsible for dumpfile generation
that we could use the external JSON in the dumpfiles, I would strongly
favour it. Converting the old dumps into external JSON would be a
one-time effort, but it becomes bigger the longer one waits.
Having to support legacy dumpfile formats cretes a lot of legacy code
burden.
We already have support for serializing to the external JSON format. In principle, one could start with the parsing code from scratch, without the burden of the other JSON parser that has to cover all those special cases. We are thinking about using fasterxml.jackson for the new implementation, which leads to much nicer code (see also the discussion at #47).
Reply to this email directly or view it on GitHub:
#73 (comment)
from wikidata-toolkit.
I just got news that the Wikidata team is now starting to plan the gradual conversion of text blobs to the external JSON format. This means that we should give high priority to implementing the parsing of this format, or we will soon start to miss data. The parsing can be independent of the current code, since the two types of JSON will be distinguished (I guess by some content model); so we don't have to decide from the JSON which format we have.
@fer-rum It would be good if you could start working on this next, ideally using the ideas you got for jackson-based parsing. This can go into a package under data model, since it is a very central functionality that will be used for API result parsing as well as for future dump processing.
from wikidata-toolkit.
As @JeroenDeDauw points out, there is already a PHP reference implementation for parsing the datamodel from external JSON now: https://github.com/wmde/WikibaseDataModelSerialization/tree/master/src
The PHP object model may have some smaller differences to the Java implementation, but this might still be helpful to resolve questions about how to interpret the structure in each case.
from wikidata-toolkit.
Acknowledged, I will start implementing this ASAP.
from wikidata-toolkit.
This problem has now been fixed by the new JSON parsing code that was merged with #91. The JSON conversion is now split across dozens of (smaller) files. It's also much faster. Since the Wikidata dump export code regenerates the format even for old revisions, we should be able to retire the old code completely soon.
from wikidata-toolkit.
Related Issues (20)
- Duplicated statements with the editor.editEntityDocument api HOT 4
- Generic method to create new EntityDocument in WikibaseDataEditor
- Release? HOT 3
- Update Javadocs HOT 1
- Built-in server HOT 6
- Support EDTF HOT 5
- Migrate to <dependencyManagement> to control versions of dependencies HOT 1
- Wrong password gives unclear message HOT 2
- Deserializing a property with `localMedia` datatype fails
- Remove datatype IRI from Wikibase data model implementation HOT 1
- Rename to Wikibase-Toolkit or something else? HOT 2
- OAuth2 support
- Incorrect behavior when converting multiple non-existent filenames to mids
- Support the new Wikibase REST API HOT 2
- Why does wdt use /other/wikidata/ instead of /wikidatawiki/entities/? HOT 3
- WbGetEntitiesAction cannot search media-info for titles containing "-" HOT 3
- Editing API: return revision id on successful edit HOT 5
- Drop support for Java 8
- JSON deserialization error in RdfSerializationExample
- WmfOnlineDailyDumpFile incorrectly checks for availability HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from wikidata-toolkit.