Comments (19)
Maybe @FranciscoThiesen could be interested !!!
from simdjson.
I'll give it a shot! @lemire can you assign it to me?
from simdjson.
@FranciscoThiesen Done.
from simdjson.
do you believe the strategy of (eficiently) converting json path -> json pointer and then just leveraging the current at_pointer functionality makes sense and adds value? (at least as a starting point)
I feel this would be valuable.
from simdjson.
@FranciscoThiesen Give it a try.
from simdjson.
@FranciscoThiesen Thank you for the update. Super excited about this functionality becoming available soon.
from simdjson.
@lemire We are interested in this functionality for Velox. Curious if you have a timeline in mind.
from simdjson.
@mbasmanova Work on this feature will start 'soon' (1 week or 2 weeks). JSON Path is quite rich, and it is (if nothing else) challenging to test support. However, we should have partial support in the coming weeks, at the prototypical level.
from simdjson.
This is great. Keep us posted.
from simdjson.
@mbasmanova Sorry for the delay. We fully support JSON Pointer with high performance. Supporting JSON with high performance is... challenging. A subset of the language could be supported, but this subset has a significant overlap with JSON Pointers...
from simdjson.
Basically, there are engineering issues involved to do it efficiently. If you don't care about performance, then it is easy, of course, but providing slow code is not in the spirit of this project. So... it is a challenge...
I do recommend people consider JSON Pointer.
from simdjson.
@lemire Daniel, thank you for the update. I'm wondering if you could share some more details. In particular, I'm curious what are the challenges in supporting JSON Path efficiently and what is the subset that can be supported. I haven't looked at JSON Pointers yet, but do you happen whether it is possible to automatically re-write a subset of JSON Path queries into JSON Pointers queries?
from simdjson.
but do you happen whether it is possible to automatically re-write a subset of JSON Path queries into JSON Pointers queries?
Basically JSON Pointer provides forward queries...
Given
{ "c" :{ "foo": { "a": [ 10, 20, 30 ] }}, "d": { "foo2": { "a": [ 10, 20, 30 ] }} , "e": 120 }
You have the following JSON Pointer queries...
- "/c/foo/a/1" is 2
- "/d/foo2/a/2" is 30
- "/e" is 120
The equivalent in JSON Path might be... (up to potential semantics differences)
- $c.foo.a[1]
- $d.foo2.a[2]
- $e
JSON Pointer is a well-established standard.
See https://www.rfc-editor.org/rfc/rfc6901
I should stress that JSON Pointer queries are very much still used in production and the standard is very much alive.
We also support an extension whereas you can apply a JSON Pointer from the current node, as in...
auto cars_json = R"( [
{ "make": "Toyota", "model": "Camry", "year": 2018, "tire_pressure": [ 40.1, 39.9, 37.7, 40.4 ] },
{ "make": "Kia", "model": "Soul", "year": 2012, "tire_pressure": [ 30.1, 31.0, 28.6, 28.7 ] },
{ "make": "Toyota", "model": "Tercel", "year": 1999, "tire_pressure": [ 29.8, 30.0, 30.2, 30.5 ] }
] )"_padded;
ondemand::parser parser;
ondemand::document cars = parser.iterate(cars_json);
std::vector<double> measured;
for (auto car_element : cars) {
double x = (double) car_element.at_pointer("/tire_pressure/1");
measured.push_back(x);
}
// measured.push_back == {39.9, 31, 30};
I'm curious what are the challenges
We support JSON Pointer highly efficiently. There is no head memory allocation and no need for additional dependencies.
As far as I can tell, JSON Path implementations are currently not guaranteed to be efficient.
The current state-of-the-art with respect to attempting to implement JSON Path efficiently is JSONSki but they provide only a partial implementation... It has no support for descendant selectors, and their wildcard selector implements only a part of the JSONPath specification, stepping into every entry of an array, but not into every field of an object.
- Jiang, L., & Zhao, Z. (2022, February). JSONSki: streaming semi-structured data with bit-parallel fast-forwarding. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (pp. 200-211).
The type of JSON Path queries that would be challenging to implement efficiently are queries such as$.*[[email protected] < 10]..a[?search(@.b, | {"b": "j"}]
.
It is doable if you have enough engineering effort, and I am not closing this issue. In fact, I am marking it as 'help needed' and 'good first issue'. A couple of talented engineers could implement JSON Path on top of, say, the On Demand API. But it would take more than a few days. I would be interested in working on this, and I might still work on this, but it is not trivial.
from simdjson.
@lemire do you still think this is a good-first-issue?
Issue looks challenging and interesting.
from simdjson.
@FranciscoThiesen It can be quite challenging, and maybe difficult as a starting point. However, you are welcome to give it a try, it might prove to be easier than I anticipate. Furthermore, it is not necessary to implement the full specification.
from simdjson.
@lemire Daniel, thank you for detailed explanation. I think I'm getting it. It sounds like we could support a subset of JSONPath that can be re-written into JSON Pointer.
from simdjson.
@mbasmanova Yes, such support could be done relatively quickly.
from simdjson.
I took some time this weekend to familiarize myself with the codebase + PRs introducing json pointers in the past years + some Json Path resources like (https://goessner.net/articles/JsonPath/).
@lemire @mbasmanova do you believe the strategy of (eficiently) converting json path -> json pointer and then just leveraging the current at_pointer functionality makes sense and adds value? (at least as a starting point)
The json path -> json pointer conversion appears to be much simpler that to have an at_path() method implemented from scratch.
from simdjson.
Just wanted to give an update. I am actively working on it, currently trying to solve some linker errors
from simdjson.
Related Issues (20)
- 你能训练一个连下2步的围棋ai吗?
- Trailing comma support for array and object HOT 1
- Confusing error message when trying to convert a non-scalar on-demand document to a value HOT 2
- Add Glace to the benchmarks HOT 1
- Double parsing can produce incorrect results due to integer overflow. HOT 1
- get_number().get_double() produces incorrect results, but get_double() is correct HOT 1
- unsafe precondition(s) violated: ptr::write requires that the pointer argument is aligned and non-null HOT 1
- [SOLVED] ambiguous template specialization 'get<simdjson::fallback::ondemand::document>' HOT 2
- How can I fix 'simdjson::dom::parser::Iterator::is_object': Use the new DOM navigation API instead (see doc/basics.md)' compiler warning in VS2019? HOT 1
- Implement an ability to parse integers that exceed 64 bits HOT 11
- Does this library only support the read operations? I have seen some APIs that do not seem to support the write operations similar to rapidjson. HOT 1
- Branchless integer parsing
- Wrong version number for release 3.7.0 HOT 9
- 3.6.4: build fails with gcc 14.x HOT 9
- Fallback parser missing on aarch64 + Linux HOT 7
- When capacity of padded_string_view is given a size smaller than length, padding() is wrapping HOT 2
- Security Policy HOT 2
- Fail to parse boolean in a truncated document stream. HOT 3
- Does simdjson get faster if you keep parsing objects with the same schema? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from simdjson.