JsonConverter is a class in the wdtk-dumpfiles package which consists of almost 1000 l

As <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Refactor JsonConverter about wikidata-toolkit HOT 8 CLOSED

Benestar commented on August 15, 2024

Refactor JsonConverter

from wikidata-toolkit.

Comments (8)

mkroetzsch commented on August 15, 2024

Thanks for the feedback. We are considering to refactor this code using another JSON library for parsing (which might also make processing faster). However, the code will not work for Special:EntityData since this returns another kind of JSON. The JsonConverter is for the internal JSON that is only found in dump files; this is different from the JSON you get through the API.

from wikidata-toolkit.

Benestar commented on August 15, 2024

Hmm, this is weird and I think the dumps should also use the external JSON format. However, are you planning to support the external JSON format, too, or would it help you if I created a new component for that purpose?

from wikidata-toolkit.

mkroetzsch commented on August 15, 2024

We plan to have a new component for the external format, since we will need this for interpreting API result. The external format is much less messy and more uniform (and it is even documented somewhere, which is not the case for the internal JSON). The internal JSON format is what is actually stored in the database; even if it would change now, the old revisions would remain the same. This is why we have to support several versions of the format -- for eternity ...

It has been discussed if all the internal JSON should be rewritten (even in old revisions) to be the new format, but this has not happened yet.

We already have support for serializing to the external JSON format. In principle, one could start with the parsing code from scratch, without the burden of the other JSON parser that has to cover all those special cases. We are thinking about using fasterxml.jackson for the new implementation, which leads to much nicer code (see also the discussion at #47).

from wikidata-toolkit.

fer-rum commented on August 15, 2024

On 01.06.2014 20:05, Markus Krötzsch wrote:

We plan to have a new component for the external format, since we will need this for interpreting API result. The external format is much less messy and more uniform (and it is even documented somewhere, which is not the case for the internal JSON). The internal JSON format is what is actually stored in the database; even if it would change now, the old revisions would remain the same. This is why we have to support several versions of the format -- for eternity ...

It has been discussed if all the internal JSON should be rewritten (even in old revisions) to be the new format, but this has not happened yet.
If we could agree with the people responsible for dumpfile generation
that we could use the external JSON in the dumpfiles, I would strongly
favour it. Converting the old dumps into external JSON would be a
one-time effort, but it becomes bigger the longer one waits.

Having to support legacy dumpfile formats cretes a lot of legacy code
burden.

We already have support for serializing to the external JSON format. In principle, one could start with the parsing code from scratch, without the burden of the other JSON parser that has to cover all those special cases. We are thinking about using fasterxml.jackson for the new implementation, which leads to much nicer code (see also the discussion at #47).

Reply to this email directly or view it on GitHub:
#73 (comment)

from wikidata-toolkit.

mkroetzsch commented on August 15, 2024

I just got news that the Wikidata team is now starting to plan the gradual conversion of text blobs to the external JSON format. This means that we should give high priority to implementing the parsing of this format, or we will soon start to miss data. The parsing can be independent of the current code, since the two types of JSON will be distinguished (I guess by some content model); so we don't have to decide from the JSON which format we have.

@fer-rum It would be good if you could start working on this next, ideally using the ideas you got for jackson-based parsing. This can go into a package under data model, since it is a very central functionality that will be used for API result parsing as well as for future dump processing.

from wikidata-toolkit.

mkroetzsch commented on August 15, 2024

As @JeroenDeDauw points out, there is already a PHP reference implementation for parsing the datamodel from external JSON now: https://github.com/wmde/WikibaseDataModelSerialization/tree/master/src
The PHP object model may have some smaller differences to the Java implementation, but this might still be helpful to resolve questions about how to interpret the structure in each case.

from wikidata-toolkit.

fer-rum commented on August 15, 2024

Acknowledged, I will start implementing this ASAP.

from wikidata-toolkit.

mkroetzsch commented on August 15, 2024

This problem has now been fixed by the new JSON parsing code that was merged with #91. The JSON conversion is now split across dozens of (smaller) files. It's also much faster. Since the Wikidata dump export code regenerates the format even for old revisions, we should be able to retire the old code completely soon.

from wikidata-toolkit.

Refactor JsonConverter about wikidata-toolkit HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent