cburgmer / json-path-comparison Goto Github PK

Comparison of the different implementations of JSONPath and language agnostic test suite.

Home Page: https://cburgmer.github.io/json-path-comparison/

License: GNU General Public License v3.0

Shell 56.45% Clojure 0.63% Rust 1.70% JavaScript 2.57% Python 9.08% Java 2.17% PHP 1.70% Ruby 0.48% Kotlin 0.52% Dockerfile 3.33% Go 4.44% C 1.68% Haskell 0.77% Erlang 1.61% Scala 0.86% C++ 0.46% C# 5.44% Elixir 4.05% Raku 0.22% Objective-C 1.85%

jsonpath comparison test-suite standard

json-path-comparison's Introduction

json-path-comparison

Comparison of the different implementations of JSONPath and language agnostic test suite.

See https://cburgmer.github.io/json-path-comparison/ for the table generated from the queries in ./queries.

Goals

Show implementation status of well established implementations.
Inform emerging specification on existing de facto standard.
Support implementers with test cases.

How to

Regression test suite

If you are an author of an upstream implementation, you can use the report generated here to test for regressions in your logic. The regression_suite/regression_suite.yaml holds all queries and includes a consensus where one exists. Additionally a report is generated for every implementation which contains current results for queries where the consensus isn't matched or no consensus exists (see e.g. regression_suite/Clojure_json-path.yaml).

See for example the Clojure json-path regression test on how those files can be put to use.

(Re-)Run the comparison locally

To update the reports checked into Git under ./docs and others, run:

./src/with_native.sh ninja
open docs/index.html

Alternatively, you can use Docker to provide the dependencies via

./src/with_docker.sh ninja

This will take a while and some network bandwidth but has the benefit that you won't have to install anything locally.

One-off comparisons

You can quickly execute a query against all implementations by running:

echo '{"a": 1}' | ./src/with_native.sh ./src/one_off.sh '$.a'

(Or use ./src/with_docker.sh if you prefer Docker.)

Errors

Some of the complexity sadly brings its own set of errors

If Ninja fails, the failing step is unlikely to be the last (as it will let parallel requests finish first). Search for FAILED to identify the failing step. The error is most likely captured in the output file (the part behind the >). Debug from there.
Some executions might run into timeouts rather randomly (especially when the machine is under high load). The timeout mechanism is necessary as not all implementations play nice, however will sometimes skew the results. Currently the best fix is to remove the output of the query that ran into a timeout, e.g. rm -r build/results/bracket_notation_with_number_on_short_array for a whole query, and re-running Ninja to force a re-build.
Docker might fail building on re-runs due to an outdated package index. Quickest fix is to run docker rmi json-path-comparison and start from scratch.
Out of memory on Docker: Some compile steps (looking at you, Haskell) seem to need a lot of memory. Increasing the available memory for Docker should help.
In some regions, the download speeds for build requirements on the official site can sometimes have unbearably slow. However, certain implementations may be able to utilize the nearest mirror site by utilizing environment variables.
- For ./src/with_docker.sh: Write to ./src/docker_env_file.txt in accordance with the --env-file option specified in the docker run command.
- For ./src/with_native.sh: Directly export environment variables.
If docker build fails on M1 with Colima, maybe https://www.tyler-wright.com/using-colima-on-an-m1-m2-mac/ helps.

json-path-comparison's People

Contributors

Stargazers

Watchers

json-path-comparison's Issues

Let's talk about IETF standardization of JSONPath

Some of you are already aware that we are looking into creating a standards-track specification for JSONPath, in the same way that RFC 6901 serves as a specification for JSON Pointer. We simply want to use the more powerful JSONPath in other standards, and that doesn't work well if there is not a standards document we can point to.

We created a strawman document in https://tools.ietf.org/id/draft-goessner-dispatch-jsonpath-00.html -- this is the way the IETF works, we like to have concrete documents to talk about. But the actual work of course still needs to be done.

I could imagine that the amazing work done by this community, and in particular the Proposal A you are converging on, could provide a significant input to get this right.

Next week we will have IETF 108 (as an online meeting), and we will need to discuss where to host JSONPath in the IETF.
I made a quick 1:43 video for introducing this discussion: https://youtu.be/Ujch6Wukjc0

But that discussion is maybe less important, but as the next item we'll need to think about the exact goals we have in mind for this activity. So if you have opinions about this, maybe we can use this issue to collect them. And maybe we need to have this discussion also so you can be comfortable that we are not going to ignore your input.

How to show error states?

It seems the spec (https://goessner.net/articles/JsonPath/) doesn't call out error states, and as such it seems the different implementations have found different solutions.

Goessner's jsonpath: returns false, same for queries with syntax errors
Clojure + Java: throw exceptions
JavaScript jsonpath-plus: returns undefined (not a valid JSON response)
Python: returns an empty list []
Rust: returns null

This issue is not for finding a preferred response, but how to flag this in the table.

dotNET builds fail on native after running against docker

It seems the temp files dotNET puts in ./obj/ interfere across platforms:

FAILED: implementations/dotNET_Json.NET/build/Dotnet_Json.NET
implementations/dotNET_Json.NET/install.sh $(basename $(dirname implementations/dotNET_Json.NET/build/Dotnet_Json.NET))
Microsoft (R) Build Engine version 16.5.0+d4cbfca49 for .NET Core
Copyright (C) Microsoft Corporation. All rights reserved.

  Restore completed in 130.86 ms for ./json-path-comparison/implementations/dotNET_Json.NET/Dotnet_Json.NET.csproj.
/usr/local/share/dotnet/sdk/3.1.201/Sdks/Microsoft.NET.Sdk/targets/Microsoft.PackageDependencyResolution.targets(234,5): error NETSDK1047: Assets file './json-path-comparison/implementations/dotNET_Json.NET/obj/project.assets.json' doesn't have a target for '.NETCoreApp,Version=v3.1/osx-x64'. Ensure that restore has run and that you have included 'netcoreapp3.1' in the TargetFrameworks for your project. You may also need to include 'osx-x64' in your project's RuntimeIdentifiers. [./json-path-comparison/implementations/dotNET_Json.NET/Dotnet_Json.NET.csproj]

Build FAILED.

/usr/local/share/dotnet/sdk/3.1.201/Sdks/Microsoft.NET.Sdk/targets/Microsoft.PackageDependencyResolution.targets(234,5): error NETSDK1047: Assets file './json-path-comparison/implementations/dotNET_Json.NET/obj/project.assets.json' doesn't have a target for '.NETCoreApp,Version=v3.1/osx-x64'. Ensure that restore has run and that you have included 'netcoreapp3.1' in the TargetFrameworks for your project. You may also need to include 'osx-x64' in your project's RuntimeIdentifiers. [./json-path-comparison/implementations/dotNET_Json.NET/Dotnet_Json.NET.csproj]
    0 Warning(s)
    1 Error(s)

Time Elapsed 00:00:04.81

Deleting them via rm -rf ./json-path-comparison/implementations/dotNET_*/obj works around that.

Add other implementations

There is quite popular C++ library https://github.com/danielaparker/jsoncons that includes JsonPath implementation. It would be cool to add it to your comparison.

Divergent behaviour with large array indices in Proposal A

The following demonstrates an example:

$ echo '[0,1,2,3,4]' | ./run.sh '$[2:113667776004]'
...
FATAL ERROR: invalid array length Allocation failed - JavaScript heap out of memory
...

This failure is analogous:

$ echo '[0,1,2,3,4]' | ./run.sh '$[113667776004:0:-1]'
...
FATAL ERROR: invalid array length Allocation failed - JavaScript heap out of memory
...

This class of divergent behaviour was found by fuzz testing in vmware-labs/yaml-jsonpath#30.

Haskell always recompiles

It seems Cabal uses a symlink which ninja detects as a change and hence forces a rebuild everytime. This is a performance issue which will slow down repeated builds.

It seems there is an option to not use a symlink (https://cabal.readthedocs.io/en/latest/nix-local-build.html#cabal-v2-install), but it's not available in the current version:

cabal v2-install exe:cabal --install-method=copy --installdir=~/bin

Tried upgrading to 3.0.0.0 but that fails.

@akshaymankar any ideas?

Should `$[?(@.key)]` distinguish between undefined and null in Proposal A?

It seems there is no consensus whatsoever on "filter with value". I've tried catching a variety of types in
https://cburgmer.github.io/json-path-comparison/results/filter_expression_with_value.html, and you can see a mix of responses, with or without

Empty array, object, string
false
null,
undefined key
0

My reasoning to reject only the undefined key case for Proposal A was that there is no other way to implement that with the current set in JSONPath. So this would give me the most flexibility.

However this leads to query $[?(@)] becoming completely pointless, because all elements in an array are defined.

Also, I't unclear whether most languages even let you distinguish between a key being present or with value null.

Are we interested in SQL/JSON Path Expressions?

Looks like SQL has a quasi? standard for something similar to JSONPath. Is it similar enough that we should care?

Improve string escape usability in Proposal A

Currently, Proposal A uses the following (PEG.js) syntax for child names that can appear in brackets with single quotes:

SingleQuotedString
  = x:"\\'" xs:SingleQuotedString { return "'" + xs; }
  / x:"\\\\" xs:SingleQuotedString { return "\\" + xs; }
  / x:[^'] xs:SingleQuotedString { return x + xs; }
  / ''

Using an online evaluator for PEG.js and feeding in selector.peg, observe that the selector:

$['\'', '\\', '\n', '\\n']

parses to:

[
   [
      "children",
      [
         [
            "name",
            "'"
         ],
         [
            "name",
            "\"
         ],
         [
            "name",
            "\n"
         ],
         [
            "name",
            "\n"
         ]
      ]
   ]
]

Firstly, notice the redundancy: both '\n' and '\\n' are parsed as "\n". But, more importantly, notice that '\n' is treated as valid.

A user who forgets which characters need to be escaped might think that '\n' represents a string consisting of a newline character. Allowing this to parse is potentially unhelpful.

If we remove the redundancy, we can fail unsupported escape sequences (and allow for other escape sequences to be added in future).

For example, the alternative syntax:

SingleQuotedString
  = x:"\\'" xs:SingleQuotedString { return "'" + xs; }
  / x:"\\\\" xs:SingleQuotedString { return "\\" + xs; }
  / x:[^'\\] xs:SingleQuotedString { return x + xs; }
  / ''

fails to parse '\n' with the message Expected "'", "\\'", "\\\\", or [^'\\] but "\\" found, but loses no expressive power.

How to structure the discussion of Proposal A

Following on from vmware-labs/yaml-jsonpath#12 (comment):

Yes, let's discuss.
I'd prefer to not branch out into actual implementations, but rather centralise this - quite probably over at json-path-comparison. The goal is to allow other authors to join the discussions (even if we will not capture many of the major implementations it seems).

Yes, a central discussion will be best.

I don't know how to structure this though. We could start with Github issues. If we find this is hard to structure, we could try moving into the wiki later.

Agreed. Issues will be easier to manager, I think. A wiki can easily become sprawling and out of date (and is not easily versioned). Perhaps the relevant issues could be labelled as Proposal A or similar?

I suggest we also raise one issue against each implementation pointing at the list of labelled issues, since some of the authors might be unaware that this discussion is even happening. What do you think?

Lazy installation fails first runs

The run.sh are designed to fetch their dependencies if not provided yet. However some of them print to stdout while doing so, messing up the JSON output of the script. After a manual re-run the script is then fine.

Expected outcome:

Should not interrupt first run if dependencies are missing.

Erlang needs to be locked to version 22

Because of davisp/jiffy#197 we need to keep Erlang locked to version 22 for now.

Typo: filter_expression_with_substraction

Should be filter_expression_with_subtraction.

I'm just running a local build to produce a PR...

Allow leading zeroes in numeric literals in Proposal A

Looking at https://cburgmer.github.io/json-path-comparison/results/filter_expression_with_equals_number_with_leading_zeros.html makes me wonder if the rational for not supporting this in Proposal A was to allow for octal notation to be added in future.

It would be good either to allow leading zeroes or document the rationale for not supporting this.

Give same output on timeout

The docker image uses a different binary for timeout and hence provide a different output when commands run over their time budget.

Feature: one off command to quickly compare a query across implementations

Feature: Test scalar/array support where implementation offers both

Negative array slice steps in Proposal A

Proposal A has the following TODO item:

$[::-2]

No consensus, but if we support $[::2] we should probably support this too. The current default values (start 0 and end len(array)) do not work for negative steps as the start and end need to be switched.

Basics

Let's start by agreeing some basic behaviour of the array slice[start;end;step] for non-negative start, end, and step:

start is the inclusive starting point of the slice (with 0 meaning the first element of the array)
end is the exclusive end point, or "fence post" (see below), of the slice
step is the value we should add on each iteration, and must be non-zero to avoid an infinite loop (unless start >= end)

So, when step is positive, the indices of the slice are the same as the values of i iterated over by this for loop in C:

for (i = start; i < end; i = i + step) {
  ...
}

Notice that if start >= end, the slice is empty. Also, indices which are greater than or equal to the length of the array are discarded.

Some examples are in order:

[0:2:1] applied to [0,1,2] produces [0,1]
[1:1:1] applied to [0,1,2] produces []
[2:9:1] applied to [0,1,2] produces [2]

Default values

Any of the three values may be omitted:

start defaults to 0, so for example [:2:1] covers the first two elements of the array
end defaults to the length of the array, so for example [1::1] covers all except the first element of the array
step defaults to 1, so for example [1:3:] covers the second and third elements of the array

These defaults allow contractions such as [1:] and even [:] and [::].

Negative values

Now let's see how the notation can be extended to include negative values.

If start or end are negative, this is simply a shorthand way of indexing from the end of the array. If start is negative, the starting index is len+start where len is the length of the array being sliced. Similarly for end. This came from Göessner who defines $..book[-1:] to be the last book in the array. (Since 0 indexes the first element of an array, it might be tempting to make -0 index the last element of the array, were it not for the fact that -0 and 0 are equal, at least when treated as integers.)

But what does a negative step mean?

There seem to be three possible interpretations. The first two iterate in the direction start to end, so I've called them "forwards" interpretations. The third iterates in the direction end to start, so I've called it the "backwards" interpretation.

1. Forwards A

"forwards A" corresponds to the for loop:

for (i = start; i > end; i = i + step) {
  ...
}

The assumptions here are that start is still the inclusive starting point, end is still the exclusive end point, and to have any hope of a non-empty slice start > end.

So, for example, [2:0:-1] applied to [0,1,2] produces [2,1]. ~~One problem with this interpretation is that there is no way to include the first value of the array in such a reverse slice.~~ (correction: the way to do this is to omit the end value - see below.)

What should start and end default to? Perhaps length-1 and -1 (literally, not in the earlier sense of a shorthand for length-1). For example, with these defaults [::-1] enumerates the whole array backwards. Unfortunately, these defaults are not consistent with the defaults when step is positive.

2. Forwards B

"forwards B" corresponds to the for loop:

for (i = start - 1; i >= end; i = i + step) {
  ...
}

This time start is re-interpreted to be the exclusive starting point, end to be the inclusive end point.

So, for example, [2:0:-1] applied to [0,1,2] produces [1,0].

What should start and end default to? Perhaps length and 0. For example, with these defaults [::-1] enumerates the whole array backwards. Again, there is a problem with these defaults in that they are not consistent with the defaults when step is positive.

3. Backwards

"backwards" corresponds to the for loop:

for (i = end - 1; i >= start; i = i + step) {
  ...
}

So, for example, [0:2:-1] applied to [0,1,2] produces [1,0].

The same defaults of start and end can be used as when step was positive. So, for example, [::-1] enumerates the whole array backwards.

Choosing an interpretation

What should guide the choice of interpretation? We have Göessner's original definition of JSONPath and the behaviour of the various implementations.

Göessner states:

[JSONPath] borrows [...] the array slice syntax proposal [start:end:step] from ECMASCRIPT 4.

but unfortunately the corresponding links point at long since abandoned ES4 documents.

The internet archive turned up Brendan Eich’s ES4 slice proposal, which seems to favour the backwards interpretation (emphasis added):

As in Python, a slice operator has the form seq[start:end:step], where start is the starting index, end is the fencepost – one greater than the last index to slice, and step is the optional increment from start if positive or decrement from end if negative. If start is not given, 0 is used. If end is not given, seq.length is used. If start or end is negative, seq.length is added to it. After this step, any value not in [0, seq.length] is clamped to the nearest bound in that interval. If step is undefined, 1 is used; else ToInteger(step) must be non-zero. Any of start, end, and step may be omitted, and trailing colons may be omitted, but at least one colon is required to denote a “full” or “copying” slice: seq[:].

What about the implementations? One comparison tests sheds some light on the matter.

Array slice with negative step applies the selector $[3:0:-2] to the array:

["first", "second", "third", "forth", "fifth"]

There is no consensus, but several implementations, including Proposal A, produce:

[
  "forth",
  "second"
]

which agrees with the "forwards A" interpretation while several others produce an empty array which agrees with the "backwards" interpretation.

Absolute path detection should work inside Docker image as well

/json-path-comparison is not detected and currently there are paths leaking in.

Ordering not stable for Golang_github.com-PaesslerAG-jsonpath

The Golang library github.com/PaesslerAG/jsonpath doesn't seem to provide a stable ordering across different systems.

See 823adb0, the diff stems from re-running the build after a PR generated on a different machine.

Add scenario to check for handling of duplicate values in output

This is prompted by ietf-wg-jsonpath/draft-ietf-jsonpath-base#23.

In short, we'd like to find out what the consensus is, if any, around removing duplicates.

Given the instance:

{
  "a": [
    "string",
    null,
    true
  ],
  "b": [
    false,
    "string",
    5.4
  ]
}

The path $.*[0,:5] will select the value "string" three times: the child of a twice and the child of b once.

Three possible outcomes exist:

No duplicate elimination. This would have three "string" instances in the results.
Value-based duplicate elimination. This would remove all but one "string" instance.
Location-based duplicate elimination. This would have two "string" instances, one from a and one from b (relates to #29).

Perl version upgrade fixes recursive_wildcard_on_scalar

Bumping alpine to 3.11 in dd42071 triggered a change in Perl_JSON-Path for the recursive_wildcard_on_scalar query.

This is not a bug but raises a concern how representable the environment is if the implementation has implicit assumptions on the underlying runtime.

Difference between Alpine 3.10 and 3.11 for perl is
This is perl 5, version 28, subversion 2 (v5.28.2)
vs
This is perl 5, version 30, subversion 1 (v5.30.1)

Union with wildcard and number in Proposal A

Proposal A supports a mixture of wildcards and numbers such as $[*,1] whereas the consensus is "not supported".

What's the rationale for this behaviour of Proposal A? Should Proposal A change?

My personal preference is to go further and allow arbitrary mixtures of wildcards and numbers and produce the corresponding values in the output as that seems to make the syntax and semantics more uniform. For example, Goessner (here) allows the selector $[*,1,0,*] which, when applied to the input document:

["a","b","c"]

produces the output:

[
   "a",
   "b",
   "c",
   "b",
   "a",
   "a",
   "b",
   "c"
]

Admittedly, this behaviour is useless in practice, but it should makes the behaviour easier to understand (and document) as there are fewer special cases to remember.

Clamping of array indices in Proposal A

The handling of excessively large and excessively small array indices seems to be inconsistent:

$ echo '[0,1,2,3,4]' | ./run.sh '$[-99]'
[0]
$ echo '[0,1,2,3,4]' | ./run.sh '$[99]'
[]

For consistency, either the first of these should return [] (which would be consistent with Göessner) or the second should return [4].

Note that Python behaviour does not help guide us as it will give an array out of bounds error.

Perhaps there is a good reason for this behaviour, in which case it would be good to document it.

Grouping queries by features

The table is growing larger and becomes difficult to evaluate with an eye. One of the solutions may be tagging each query with features, For example, query $["key"] can be tagged with features bracket notation and double-quoted names; query $.a..* can be tagged with features nested names, dot notation, deep scan and wildcard and so on.

If the implementation hits any consensus in a query tagged with any given feature, we can say that it supports that feature, and then we can build a matrix of supported feature combinations. For example, that will show that some implementation supports names without quotes in dot-notation but doesn't support them in bracket notation.

Clarify "Array index dot notation on object"

Currently our own naming somewhat conflates the different notations, especially Array index dot notation on object (https://cburgmer.github.io/json-path-comparison/results/array_index_dot_notation_on_object.html) seems incorrect, as this is indeed the dot notation for a path on an object. Calling a query Array index dot notation (https://cburgmer.github.io/json-path-comparison/results/array_index_dot_notation.html) also somewhat suggests such a thing is mandated by Goessner's articles.

Let's clarify this.

Consensus across result types

Since #37 we now support finding a consensus independent on the different types of results the implementations give.
We handle

Array vs scalar results,
Empty responses (empty list or null) vs. not found errors.

Those rules now introduce a "mini-consensus" in itself, as they may argue over the type of the query:

Scalar with result
Non scalar with result
Scalar without result
Non scalar without result

Currently, at the time of writing, the following example seems to have implementations argue for either, scalar or non-scalar:
https://cburgmer.github.io/json-path-comparison/#array_slice_on_object

It has a clear consensus on no matches found. However:

3 implementations insist that it's a scalar query:

Kotlin_com.nfeld.jsonpathlite (returns null)
Java_com.jayway.jsonpath (returns NOT_FOUND, which it reserves for scalar queries only)
Objective-C_SMJJSONPat (returns NOT_FOUND, which it reserves for scalar queries only)

3 implementations insist that it's a non-scalar query, as all support scalar responses yet return an empty array:

Golang_github.com-PaesslerAG-jsonpath
Golang_github.com-bhmj-jsonslice
Elixir_warpath

This is not a theoretical issue, but one that will decide which of the implementations will fail the consensus, as the results may switch from [] to None or not found error or vice versa.

Now, this might surface an actual problem in the implementations: What would the user expect, that the query always is understood to be of the same type (scalar vs non-scalar) regardless of the document it is executed against, or not?

The problem however that I currently see is that the simple majority decides who "wins". The consensus rule of simple majority + 2 is not applied here.

One possible solution could be to ignore the differences and relax the requirement for implementations to be consistent in their response format.

Or we just accept that for the smaller set of implementations which tend to handle scalar responses differently to the majority, we require a less comfortable consensus.

Permission to steal and re-use

So, I shared this with the folks over at JSON Schema (of which I'm a part), and they really liked it. I was wondering if either:

If you'd want to join us and build/manage a similar site for JSON Schema, or
Mind if we stole and repurposed this ourselves.

Your thoughts?

check and X interpretation is misleading on first glance

Let's face it, you'll look at the chart first before reading the text at the top or the legend at the bottom. I did and I thought checkmark represented "correctness", X represented "incorrectness" result, and e represented exceptional errors/crashes. Can these be changed to something clearer? I don't have alternatives at the top of my head but this is something that can be brainstormed for better UX.

It may also be useful to split the e with another symbol to track differences between paths that had exceptions (crashes) and those intentionally thrown during path compiling when invalid/unsupported tokens are input (or similar).

Elixir_ExJsonPath and Elixir_warpath behave differently across Linux and OSX

Running Elixir_ExJsonPath and Elixir_warpath against the Docker image yields different output than on OSX.

Commit 03b9fe9 shows the diff from a build against the Docker image.

Build fails on installing Haskell_jsonpath implementation

I'm trying to perform build and it fails with the following message.

[301/5999] ./src/query_implementation.sh queries/bracket_notation_with_empty_string implementations/Ruby_jsonpath > 'build/results/bracket_notation_with_empty_string/Ruby_jsonpath'
ninja: build stopped: subcommand failed.

I have no idea where to look to discover what's gone wrong.

Applying filters to objects in Proposal A

The Goessner article and implementations conceive of filters as applying only to arrays. However, Proposal A allows filters to apply to (the values of) objects. The README does not explain the rationale for this decision, but it would be helpful to know.

I'm personally not convinced that this capability is beneficial:

Filtering applies to the values of an object and not to its keys which seems somewhat arbitrary and asymmetric.
The current consensus is that objects cannot be accessed using array indexing, which seems analagous to using a filter to access an object.

Preserve duplicates after filtering in Proposal A

It's not clear why Proposal A removes duplicates after filtering. See the following examples:

I think filtering of an array should preserve ordering and therefore preserve duplicates too.

clean.sh doesn't clean Python dep directories

Incorrect behaviour of logical not

It seems that logical not (!) in filters in Proposal A doesn't work as expected. For example:

$ echo '[0]' | ./run.sh '$[?(true)]'
[0]
$ echo '[0]' | ./run.sh '$[?(!false)]'
[]

Allow dot notation without root in Proposal A

Many of the implementations allow eliding the dot where a dot child appears at the start of the path. See https://cburgmer.github.io/json-path-comparison/results/dot_notation_without_root.html.

This seems like reasonable syntactic sugar and I'd like it to be considered for Proposal A. If this proposal is rejected, it would be great to document the rational.

Support heterogenous responses

Responses across implementations deviate not only on a per-query basis, but differences seem to form a more general pattern. One of those patterns is already well supported: Some implementations return a single value when only 1 or 0 matches are possible (e.g. for $.key), while others will always return a list.

For this identified case the comparison already applies a canonical form by equating the different responses. [42] for Goessner's implementation is understood to be equal to 42 for say Java's com.jayway.jsonpath. See example https://cburgmer.github.io/json-path-comparison/#dot_notation. Currently the canonical form choses the list based representation.

However there seem to be more patterns not yet understood by the comparison:
An empty match [] in Goessner's implementations might correspond to null in Kotlin. Or even a NotFound exception in Java's com.jayway.jsonpath. See example https://cburgmer.github.io/json-path-comparison/#dot_notation_on_object_without_key.

Errors also seem to show a more complex pattern, apart from the NotFound mentioned above:
Some implementations return a SyntaxError if a selector is not understood. See https://cburgmer.github.io/json-path-comparison/#dot_bracket_notation. These could be separated from a more generic evaluation error.

To summarise the patterns:

Majority of implementations respond with multiple items.
Some implementations respond with a list of one item, some with a single item.
Some implementations respond with an empty list, some with the value null, some with a NotFound error.
Majority of implementations respond with a 'SyntaxError`.

Implementation:

To avoid single implementations being inconsistent in itself, e.g. returning a single item in one case, but then a list with a single item in a different case, we could preselect the style of each implementation (e.g. 'list based', 'scalar based with null', 'scalar based with NotFound'). We could thus only match - say a single item response - if the implementation is known (and configured) to produce such a pattern.
The majority calculation would have to check each pattern and vote for the one that matches best, i.e. with the highest match count.
We would have to move away from the existing a-priori mechanism, to the more flexible rule based one at the latter majority calculation stage.
The consensus result becomes heterogenous, e.g. a triple for [], null, NotFound.

Multiplication and division in Proposal A

Proposal A lists multiplication and division as "to dos". I'm nervous that this is the start of scripting and once these operations are introduced, the proposal may be drawn into implementing some not particularly well scoped subset of JavaScript.

Can we instead agree that scripting is out of scope and drop multiplication and division? If not, how/where do we draw the line?

"Array slice with interval of -1" case doesn't match the path $[2:1]

Given the title, I would expect the path to be $[2:1:-1] (which Manatee.Json will happily return successfully).

Is the -1 implied by the fact that the start is greater than the end? If that's the case, then maybe change the title to "Array slice with implied interval of -1".

Fix Raku install via Docker

The wrap_in_docker script will remove the container after every run (docker run --rm). This is intentional so we start from a clean slate.
Also, installing dependencies locally helps when switching between running ninja natively vs via Docker as it's easier to track which dependencies have been installed via ninja's own task resolution.

When adding Raku this slipped my mind, and so Raku is failing on a second run as the dependencies are lost.

Workaround for now is forcing Ninja to install the dependencies on every run, by first removing the installation:

rm implementations/Raku_JSON-Path/build/zef_installed

Does JSONPath guarantee an ordered list for multiple responses?

Not all implementations seem to return results in order of occurrence (depth-first) for recursive descending queries in the JSON structure, e.g. Java (com.github.jsurfer) for https://cburgmer.github.io/json-path-comparison/results/recursive_key.html and https://cburgmer.github.io/json-path-comparison/results/recursive_wildcard.html).

Is this a bug or just not guaranteed?

jsonslice update

Hi! Since my library https://github.com/bhmj/jsonslice does participate in your comparison chart I would like to kindly inform you that I managed to complete the refactoring and now jsonslice supports deepscan operator (..) as well as some other variants of queries which were not supported earlier. Could you please update your comparison chart https://cburgmer.github.io/json-path-comparison/ to match the current state of jsonslice. Thanks.

Call out canonical ordering in bug reports

Clarify in bug reports that certain queries have no defined ordering.

Perl JSON-Path indeterministic result ordering for recursive selector

The JSON-Path implementation for Perl returns different results for recursive selectors, making repeated runs of ./go store different outcomes.

This issue is tied to #3, i.e. should we decide that there is no expected ordering in results, we could find a canonical representation always leading to the same representation.

Other implementations WIP

Just to keep track of what's already half way implemented, but not merged yet:

PHP_remorhaz-jsonpath
See PR, #6
Golang_github.com-bhmj-jsonslice
Open question around correct API usage: bhmj/jsonslice#5
~~Golang_github.com-yalp-jsonpath~~
~~Seems to drag down the consensus a lot due to many features missing~~
Perl_JSON-Path
Indeterministic output, #4
Bash_JSONPath.sh
Does not work cross platform (OS X has issues, see bashtools/JSONPath.sh#6)
Postgres_jsonpath Not reliable and unclear whether a good fit as it SQL doesn't fully implement the JSONPath as set out by Goessner
Objective-C_SMJJSONPath
Looking good, but differences between Ubuntu and OSX
Swift_SwiftPath Currently Ubuntu in Docker seems to disagree for some queries, g-mark/SwiftPath#15
Golang_github.com-spyzhov-ajson pending spyzhov/ajson#16

Clarify relationship to jsonpath-standard

Hey, I've lost a bit track of the developments over at https://github.com/jsonpath-standard. However, I understand that while Proposal A was a good start for a discussion, we have since moved on and it has become obsolete. So have all the open issues here related to that.
At the same time there is now a growing reference implementation.

Would the right thing be to replace Proposal A under https://cburgmer.github.io/json-path-comparison/ with the reference implementation? @glyn

I'm happy for the comparison project to continue to document all the implementations out there, and would then change the roadmap accordingly!

Where is the expected result?

First, great page & study.

I'd like to implement this suite in my library to address some of the issues that the suite found with my implementation. However, I can't find where you store the expected result from your queries.

For example, in https://github.com/cburgmer/json-path-comparison/tree/master/queries/array_index_last you have

document.json that contains the JSON data
selector that contains the JSON Path
SCALAR_RESULT that is an empty file (this doesn't download in a git clone operation either)

If I want to run this test in my suite, I need an expected result to compare against. Where can I find this value?

Unreliable Docker build

The command ./src/wrap_in_docker.sh ninjafailed to install Raku due to a collection of downloads which were not found:

...
Step 23/24 : RUN apt-get install -y --no-install-recommends rakudo perl6-zef
 ---> Running in c291994bfd9b
Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
  fonts-glyphicons-halflings libgraph-perl libipc-system-simple-perl
  libjs-angularjs libjs-bootstrap libpath-tiny-perl libtommath1 moarvm nqp
Suggested packages:
  valgrind
Recommended packages:
  libunicode-utf8-perl
The following NEW packages will be installed:
  fonts-glyphicons-halflings libgraph-perl libipc-system-simple-perl
  libjs-angularjs libjs-bootstrap libpath-tiny-perl libtommath1 moarvm nqp
  perl6-zef rakudo
0 upgraded, 11 newly installed, 0 to remove and 0 not upgraded.
Need to get 7080 kB of archives.
After this operation, 50.5 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu groovy/universe amd64 fonts-glyphicons-halflings all 1.009~3.4.1+dfsg-1 [117 kB]
Get:2 http://archive.ubuntu.com/ubuntu groovy/universe amd64 libgraph-perl all 1:0.9704-1 [109 kB]
Get:3 http://archive.ubuntu.com/ubuntu groovy/main amd64 libipc-system-simple-perl all 1.30-1 [23.2 kB]
Get:4 http://archive.ubuntu.com/ubuntu groovy/universe amd64 libjs-angularjs all 1.8.0-1 [552 kB]
Get:5 http://archive.ubuntu.com/ubuntu groovy/universe amd64 libjs-bootstrap all 3.4.1+dfsg-1 [124 kB]
Get:6 http://archive.ubuntu.com/ubuntu groovy/main amd64 libpath-tiny-perl all 0.114-1 [42.6 kB]
Get:7 http://archive.ubuntu.com/ubuntu groovy/main amd64 libtommath1 amd64 1.2.0-3 [53.0 kB]
Err:8 http://archive.ubuntu.com/ubuntu groovy/universe amd64 moarvm amd64 2020.05+dfsg-1
  404  Not Found [IP: 91.189.88.142 80]
Err:9 http://archive.ubuntu.com/ubuntu groovy/universe amd64 nqp amd64 2020.05+dfsg-1
  404  Not Found [IP: 91.189.88.142 80]
Err:10 http://archive.ubuntu.com/ubuntu groovy/universe amd64 rakudo amd64 2020.05.1-1
  404  Not Found [IP: 91.189.88.142 80]
Err:11 http://archive.ubuntu.com/ubuntu groovy/universe amd64 perl6-zef all 0.8.4-3
  404  Not Found [IP: 91.189.88.142 80]
Fetched 1020 kB in 0s (2130 kB/s)
E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/universe/m/moarvm/moarvm_2020.05+dfsg-1_amd64.deb  404  Not Found [IP: 91.189.88.142 80]
E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/universe/n/nqp/nqp_2020.05+dfsg-1_amd64.deb  404  Not Found [IP: 91.189.88.142 80]
E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/universe/r/rakudo/rakudo_2020.05.1-1_amd64.deb  404  Not Found [IP: 91.189.88.142 80]
E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/universe/p/perl6-zef/perl6-zef_0.8.4-3_all.deb  404  Not Found [IP: 91.189.88.142 80]
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
The command '/bin/sh -c apt-get install -y --no-install-recommends rakudo perl6-zef' returned a non-zero code: 100

I guess the Docker cache was out of date, so I edited ./src/wrap_in_docker.sh as follows:

...
docker build --no-cache -t "$target_image" "$script_dir"
...

(I don't want to check this in because it will increase everyone's build times, but maybe we need to say something in the README. I guess the problem could occur on any install where downloads have been removed.)

I then got another failure:

[87/10474] (cd implementations/Clojure_json-path && ./lein uberjar) && mv implementations/Clojure_json-path/target/uberjar/json-path-comparison-0.1.0-SNAPSHOT-standalone.jar implementations/Clojure_json-path/build/json-path-comparison.jar && rm -r implementations/Clojure_json-path/target
Downloading Leiningen to /root/.lein/self-installs/leiningen-2.9.1-standalone.jar now...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   650  100   650    0     0   1975      0 --:--:-- --:--:-- --:--:--  1975
100 13.9M  100 13.9M    0     0  1134k      0  0:00:12  0:00:12 --:--:-- 1437k
Retrieving org/clojure/clojure/1.10.0/clojure-1.10.0.pom from central
Retrieving org/clojure/spec.alpha/0.2.176/spec.alpha-0.2.176.pom from central
Retrieving org/clojure/pom.contrib/0.2.2/pom.contrib-0.2.2.pom from central
Retrieving org/clojure/core.specs.alpha/0.2.44/core.specs.alpha-0.2.44.pom from central
Retrieving cheshire/cheshire/5.8.1/cheshire-5.8.1.pom from clojars
Retrieving com/fasterxml/jackson/core/jackson-core/2.9.6/jackson-core-2.9.6.pom from central
Retrieving com/fasterxml/jackson/jackson-base/2.9.6/jackson-base-2.9.6.pom from central
Retrieving com/fasterxml/jackson/jackson-bom/2.9.6/jackson-bom-2.9.6.pom from central
Retrieving com/fasterxml/jackson/jackson-parent/2.9.1.1/jackson-parent-2.9.1.1.pom from central
Retrieving com/fasterxml/oss-parent/33/oss-parent-33.pom from central
Retrieving com/fasterxml/jackson/dataformat/jackson-dataformat-smile/2.9.6/jackson-dataformat-smile-2.9.6.pom from central
Retrieving com/fasterxml/jackson/dataformat/jackson-dataformats-binary/2.9.6/jackson-dataformats-binary-2.9.6.pom from central
Retrieving com/fasterxml/jackson/dataformat/jackson-dataformat-cbor/2.9.6/jackson-dataformat-cbor-2.9.6.pom from central
Retrieving tigris/tigris/0.1.1/tigris-0.1.1.pom from clojars
Retrieving org/clojure/clojure/1.5.1/clojure-1.5.1.pom from central
Retrieving org/sonatype/oss/oss-parent/5/oss-parent-5.pom from central
Retrieving json-path/json-path/2.1.0/json-path-2.1.0.pom from clojars
Retrieving org/clojure/clojure/1.10.1/clojure-1.10.1.pom from central
Retrieving org/clojure/clojure/1.10.0/clojure-1.10.0.jar from central
Retrieving com/fasterxml/jackson/core/jackson-core/2.9.6/jackson-core-2.9.6.jar from central
Retrieving org/clojure/core.specs.alpha/0.2.44/core.specs.alpha-0.2.44.jar from central
Retrieving org/clojure/spec.alpha/0.2.176/spec.alpha-0.2.176.jar from central
Retrieving com/fasterxml/jackson/dataformat/jackson-dataformat-cbor/2.9.6/jackson-dataformat-cbor-2.9.6.jar from central
Retrieving com/fasterxml/jackson/dataformat/jackson-dataformat-smile/2.9.6/jackson-dataformat-smile-2.9.6.jar from central
Retrieving tigris/tigris/0.1.1/tigris-0.1.1.jar from clojars
Retrieving cheshire/cheshire/5.8.1/cheshire-5.8.1.jar from clojars
Retrieving json-path/json-path/2.1.0/json-path-2.1.0.jar from clojars
Compiling json-path-comparison.core
Created /json-path-comparison/implementations/Clojure_json-path/target/uberjar/json-path-comparison-0.1.0-SNAPSHOT.jar
Created /json-path-comparison/implementations/Clojure_json-path/target/uberjar/json-path-comparison-0.1.0-SNAPSHOT-standalone.jar
ninja: build stopped: subcommand failed.

So I removed the --no-cache option and tried again. I hit another failure and increased the Docker VM size to 10 GB to be on the safe side and tried again...

Next failure:

...
[9767/10292] LANG=en_US.UTF-8 LC_ALL= LC_COLLATE=C ./src/compile_implementation_report.sh build/results build/implementations_matching_majority build/consensus implementations/Python_jsonpath2 > regression_suite/Python_jsonpath2.yaml
FAILED: regression_suite/Python_jsonpath2.yaml
LANG=en_US.UTF-8 LC_ALL= LC_COLLATE=C ./src/compile_implementation_report.sh build/results build/implementations_matching_majority build/consensus implementations/Python_jsonpath2 > regression_suite/Python_jsonpath2.yaml
src/shared.sh: line 29: build/results/union_with_duplication_from_array/Python_jsonpath2: No such file or directory
src/shared.sh: line 29: build/results/union_with_duplication_from_array/Python_jsonpath2: No such file or directory
src/shared.sh: line 29: build/results/union_with_duplication_from_array/Python_jsonpath2: No such file or directory
src/shared.sh: line 54: build/results/union_with_duplication_from_array/Python_jsonpath2: No such file or directory
[9768/10292] LANG=en_US.UTF-8 LC_ALL= LC_COLLATE=C ./src/compile_results_report.sh build/results build/implementations_matching_majority build/consensus queries/filter_expression_with_equals_number_with_fraction > build/markdown/results/filter_expression_with_equals_number_with_fraction.md
[9769/10292] LANG=en_US.UTF-8 LC_ALL= LC_COLLATE=C ./src/compile_results_report.sh build/results build/implementations_matching_majority build/consensus queries/bracket_notation_with_number_on_object > build/markdown/results/bracket_notation_with_number_on_object.md
[9770/10292] LANG=en_US.UTF-8 LC_ALL= LC_COLLATE=C ./src/compile_bug_reports.sh build/results build/implementations_matching_majority build/consensus implementations/Clojure_json-path > bug_reports/Clojure_json-path.md
src/shared.sh: line 29: build/results/union_with_duplication_from_array/Clojure_json-path: No such file or directory
src/shared.sh: line 29: build/results/union_with_duplication_from_object/Clojure_json-path: No such file or directory
ninja: build stopped: subcommand failed.

After several failures and having spent most of the day trying, I am running out of steam. The latest failure is:

[1157/1674] LANG=en_US.UTF-8 LC_ALL= LC_COLLATE=C ./src/compile_bug_reports.sh build/results build/implementations_matching_majority build/consensus implementations/Elixir_jaxon > bug_reports/Elixir_jaxon.md
src/shared.sh: line 29: build/results/union_with_duplication_from_array/Elixir_jaxon: No such file or directory
src/shared.sh: line 29: build/results/union_with_duplication_from_object/Elixir_jaxon: No such file or directory
ninja: build stopped: subcommand failed.

Optionally return path to item instead of item itself

Goessner's original post declares that the return type (array form) could either be the found items or the JSON Paths to those items. Is there any intent to check for this kind of output in the future? Have you seen any libraries support this?

return value:
(array|false):
Array holding either values or normalized path expressions matching the input path expression, which can be used for lazy evaluation. false in case of no match.

test case filter_regular_expression also depends on single-equals support

The test case filter_regular_expression uses $[?(@.name=~/hello.*/)] as the query string. I noticed that there is a separate test for single-equals-for-comparison usage, something I don't plan on supporting.

This test should be updated to use the double-equal operator ($[?(@.name==~/hello.*/)]), so that the only the thing being tested is regex support.