01mf02 / jaq Goto Github PK
View Code? Open in Web Editor NEWA jq clone focussed on correctness, speed, and simplicity
License: MIT License
A jq clone focussed on correctness, speed, and simplicity
License: MIT License
Could you finalize the implementation of ARG from issues/11?
I think that implementing only --arg and not --argjson (as a way to import floats) was a good choice, but you left $ARGS.named behind.
$ARGS.named is a really nice feature to directly transfer a collection of shell variables.
status=1
city="Paris"
year=1984
jq -n --arg status $_status \
--arg city "$city" \
--arg year $year \
'$ARGS.named'
{
"status": "1",
"city": "Paris",
"year": "1984"
}
The paragraph about division by 0 is outdated. Note also that jq's current (1.6) behavior is dependent on whether the numerator is a literal 0 or a variable equal to 0.
$ jq -n '0 as $n | $n / 0'
jq: error (at <unknown>): number (0) and number (0) cannot be divided because the divisor is zero
$ jq -n '1 as $n | $n / 0'
jq: error (at <unknown>): number (1) and number (0) cannot be divided because the divisor is zero
$ jq -n '0 / 0'
null
$ jq -n '1 / 0'
jq: error: Division by zero? at <top-level>, line 1:
1 / 0
jq: 1 compile error
It is great if jaq automatically disables coloring output of error messages when output is not tty.
$ jaq -n '#' 2>&1 | less
ESC[31mError:ESC[0m Unexpected end of input, expected
ESC[38;5;246m╭ESC[0mESC[38;5;246m─ESC[0mESC[38;5;246m[ESC[0m<unknown>:1:2ESC[38;5;246m]ESC[0m
ESC[38;5;246m│ESC[0m
ESC[38;5;246m1 │ESC[0m ESC[38;5;249m#ESC[0m
ESC[38;5;246m ·ESC[0m │
ESC[38;5;246m ·ESC[0m ╰─ Unexpected end of input
ESC[38;5;246m───╯ESC[0m
ESC[31mError:ESC[0m Unexpected end of input while parsing value, expected def, reduce, -, (, if, ., [, {
ESC[38;5;246m╭ESC[0mESC[38;5;246m─ESC[0mESC[38;5;246m[ESC[0m<unknown>:1:2ESC[38;5;246m]ESC[0m
ESC[38;5;246m│ESC[0m
ESC[38;5;246m1 │ESC[0m ESC[38;5;249m#ESC[0m
ESC[38;5;246m ·ESC[0m ┬
ESC[38;5;246m ·ESC[0m ╰── Unexpected end of input
ESC[38;5;246m───╯ESC[0m
Using less for stderr is minor use case, but fixing this helps me a lot on running my private shell script checking jq compatibility using gojq test cases.
Other jq
implementations have a flag to control if the output should be colorized or not:
Tool Name | Auto | On | Off | Docs |
---|---|---|---|---|
jq | default* | -C, --color-output |
-M, --monochrome-output |
jq |
gojq | default* | -C, --color-output |
-M, --monochrome-output |
gojq |
yq | default* | -C, --colors |
-M, --no-colors |
yq |
jaq |
default* | ? | ? |
* Use colors when output is a TTY
Since this project is already using clap
and colorized_json
, it would be fairly straightforward to add a flag to override the default behaviour.
Here is a jaq-defined definition of combinations/1 with the same semantics as jq's combinations/1. It seems to me that the helper functions, namely decimal2base/1 and combinations/2, are both independently worthy of inclusion in the jaq library (and could in fact also be included in the jq library in the sense that they behave in the same way when run using jq), so I have not folded them into the def of combinations/1.
# Input: a positive integer
# Output: an array representing the number in base b, with the least significant digit first.
def decimal2base(b):
b as $b
| [recurse(if . > 0 then ./$b|floor else empty end) | . % $b]
| if length > 1 then .[:-1] else . end;
# Enumerate all the ways to select m elements from range(0;n) with replacement.
# The output is a stream of arrays of length m.
def combinations(n; m):
n as $n
| m as $m
| [1, []] # state: [i, combination]
| while( (.[1] | length) <= m;
.[0] | [. + 1, decimal2base($n)] )
| .[1]
| [range(0; $m - length) | 0] + reverse;
def combinations(n):
combinations(length; n) as $c
| [ .[$c[]] ] ;
Example:
["a", "b"] | combinations(3)
jq
has fromdate
and fromdateiso8601
builtin functions. Is it possible to parse dates with jaq
?
jq has the option --arg
which is useful to pass values, to avoid escaping values
* `--arg name value`:
This option passes a value to the jq program as a predefined
variable. If you run jq with `--arg foo bar`, then `$foo` is
available in the program and has the value `"bar"`. Note that
`value` will be treated as a string, so `--arg foo 123` will
bind `$foo` to `"123"`.
Named arguments are also available to the jq program as
`$ARGS.named`.
* `--argjson name JSON-text`:
This option passes a JSON-encoded value to the jq program as a
predefined variable. If you run jq with `--argjson foo 123`, then
`$foo` is available in the program and has the value `123`.
echo '{ "k": "old value" }' | jq '.k = $v' --arg v 'new value'
While jq allows to construct object or array from null
by updating, but jaq
throws errors.
$ jq -n '.x = 0'
{
"x": 0
}
$ jq -n '.[0] = 0'
[
0
]
$ jaq -n '.x = 0'
Error: cannot index null
$ jaq -n '.[0] = 0'
Error: cannot index null
As best I can tell, there is no command-line JSON tool that can can speedily and losslessly run the equivalent of .[]
against arbitrarily large files with a single JSON entity for which each value in the stream that should be produced by .[] is relatively small.(*)
As a test case, consider 1e9.json generated by:
jq -nr '"[", (range(0;1E9) | "(.),"), "0]"' > 1e9.json. # 10,888,888,895 bytes
Interestingly:
.[]
because it quickly becomes a memory glutton (I watched it grow to consume 48GB memory);.[]
(it requires about 8GB of memory);.[]
I realize that jaq currently has no "streaming" ambitions, but it would be fantastic from many points
of view if jaq could be enhanced to fill what seems to be the current void amongst command-line tools for JSON.
A closely related issue concerns enormous files with a single JSON object with a large number of keys, each
value of which is relatively small. In such cases, one would like to be able to use a
small-footprint version of to_entries[]
.
An alternative would be a built-in for economically producing single-key objects as if by
keys_unsorted[] as $k | {($k): .[$k]}
. Unfortunately I haven't been able to come up with
a wonderful name for such a built-in. Perhaps singletons
?
Thanks.
(*) jstream comes close, but it loses precision for both very large integers and various other numbers.
There are language-specific JSON libraries for programmers that have some support for streaming large files, but as best I can tell, they are either quite difficult to use for those who don't know the specific language, or do not handle numeric literals losslessly.
[CORRECTION:]
The "JSON Machine" is a library for PHP that can be configured to work losslessly. This is done by the PHP script I wrote at https://github.com/pkoppstein/jm
I'm playing around with the jaq_core module to see if i can integrate it into a project, and I noticed that you are using single threaded Rc
. I'm wondering what the design decisions behind this was? It seems to me that a lot of the processing could likely be done in parallel via rayon
& .par_iter()
.
Would you be open to some contributions to allow parallel processing?
Some functions in jq's builtin.jq work perfectly well in jaq and so according to my understanding could easily be made available to jaq users by copying them into jaq's std.jq
Some examples I've tested:
def INDEX(stream; idx_expr):
reduce stream as $row ({}; .[$row|idx_expr|tostring] = $row);
def normals: select(isnormal);
def finites: select(isfinite);
def IN(s): any(s == .; .);
def IN(src; s): any(src == s; .);
Also, bsearch/1 can easily be adapted by changing $target
to target
.
The README mentions that string interpolation is supported, so it looks like the following example reveals a bug:
$ jq -n '"\(select(1) | null) xyzzy"'
"null xyzzy"
$ jaq -n '"\(select(1) | null) xyzzy"'
Error: Unexpected token while parsing string, expected b, r, ", n, t, \, f, /, u
╭─[<unknown>:1:3]
│
1 │ "\(select(1) | null) xyzzy"
· ┬
· ╰── Unexpected token (
───╯
Error: Unexpected end of input while parsing string, expected ", \
╭─[<unknown>:1:28]
│
1 │ "\(select(1) | null) xyzzy"
· │
· ╰─ Unexpected end of input
───╯
I am attempting to write a script that will update a "Parts" file with aws etags and part numbers. These etags are enclosed in quotes and I am getting an error when escaping and am wondering if this is a known issue or if there is a work around? ( I would take a look in the code but I don't know rust ).
Empty json file to start
{ "Parts": [] }
Attempt to add a new item with escaped quotes
▶ jaq '.Parts += [{"ETag": "\"638650f144d2438b850c4530ad0249ce\"", "PartNumber": 1}]' parts.json
Error: Unexpected token, expected :, ,, (, -, ;, >, ), ], ., ?, <, !, {, =, %, 0, *, ", +, |, [, }, $, /
╭─[<unknown>:1:56]
│
1 │ .Parts += [{"ETag": "\"638650f144d2438b850c4530ad0249ce\"", "PartNumber": 1}]
· ┬
· ╰── Unexpected token \
───╯
Error: Unexpected token, expected //, <, -=, <=, }, +=, ?, *=, =, |=, +, ., ==, %=, ,, >, or, /, |, as, /=, %, !=, >=, *, -, [, and
╭─[<unknown>:1:24]
│
1 │ .Parts += [{"ETag": "\"638650f144d2438b850c4530ad0249ce\"", "PartNumber": 1}]
· ───┬──
· ╰──── Unexpected token 638650
───╯
Currently jaq
crashes on modulo operator if the right hand side is 0. It should report an error (or emit nan).
$ jq -n '0 % 0'
jq: error (at <unknown>): number (0) and number (0) cannot be divided (remainder) because the divisor is zero
$ jaq -n '0 % 0'
thread 'main' panicked at 'attempt to calculate the remainder with a divisor of zero', jaq-core/src/val.rs:382:40
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
jaq --version
currently only gives the "release" version, so it is not possible to use the output of this flag to distinguish between various git commits.
If it is too much trouble to give at least some part of the git sha, then it would still be helpful to distinguish the "official release" from a subsequent version.
This is not urgent, but it would be very nice.
I would prefer to slurp the contents of files directly into variables, rather than using the position that a file happens to appear in the stream. I find it nicer to work with keyword parameters than to work with positional parameters. :)
I admit that this is perhaps largely a matter of personal preference. I'm curious how the crowd feels about this. Certainly, I can do everything I need with the current behavior of slurp, but this feature would make the resulting jaq
code easier to follow.
Some discrepancies with jq/gojq:
$ jaq --version
jaq 0.8.2
(1) jaq is missing:
getpath
setpath
path
paths
(2)
$ jq -n 'limit(3;range(0;infinite))'
0
1
2
$ gojq -n 'limit(3;range(0;infinite))'
0
1
2
$ jaq -n 'limit(3;range(0;infinite))'
Error: cannot use null as integer
(3)
$ jq -n 'limit(3;range(0;1.4))'
0
1
$ gojq -n 'limit(3;range(0;1.4))'
0
1
$ jaq -n 'limit(3;range(0;1.4))'
Error: cannot use 1.4 as integer
$
Hello,
I'm using jq for years but I just discovered jaq today.
I remember a detail that I have thought about and forgotten so often...
When I read
def isnormal: isnumber and ((. == 0 or isnan or isinfinite) | not);
I think "isnumber and ( is zero or nan or infini) ... oh wait ! the condition is the opposite ! but from where ?"
And I read again to find the proper parenthesis and check again the double parenthesis...
I think it could be easier to read/understand with a not(condition)
instead of condition|not
syntax.
def isnormal: isnumber and not(. == 0 or isnan or isinfinite);
My goal is not to include a not(condition)
function only to patch isnormal (I took it as sample) but to let it available for any thirdparty code.
I don't know if the not(condition)
can have a performance impact.
Feel free to make your feedback!
I was wondering if it would be possible to use jaq-core
or jaq-std
as a library. I have tried but I get an unresolved import for jaq_core::Ctx
. Thanks.
The use case I've encountered: attempting to edit an executable file in-place, removes the executable permission from the file.
BTW, love the project 👍🏼
The main difference between serde_json and serde_yaml seems to be that in YAML, map keys can be any value, whereas in JSON, map keys can be only strings.
We could lift the restriction on the key type to allow for YAML support quite quickly; however, the main blocker seems to be colored output of YAML values. This would then require a crate like colored_yaml --- which does not exist for now.
The isnormal
filter originates from the function of the same name in math.h
, and should yield false
against 0
.
$ jq -n '0 | isnormal'
false
$ jaq -n '0 | isnormal'
true
Also, the filter yields a puzzling error on non-number input.
$ jq -n '"" | isnormal'
false
$ jaq -n '"" | isnormal'
Error: cannot negate ""
When we have a filter like <f><path>
, then f
is currently strictly evaluated, whereas if we write <f> | .<path>
, f
is lazily evaluated.
However, both should behave the same.
This shows for example when calculating Fibonacci numbers:
limit(10; [1, 1] | recurse([.[1], add]) | .[0])
yields 10 Fibonacci numbers, but
limit(10; [1, 1] | recurse([.[1], add])[0])
does not terminate.
In the absence of debug/0, it seems to be quite difficult to debug jaq programs. jq's debug/0
is not ideal, but it does the job, and hopefully it or a close look-alike would be easy to implement.
Hi,
I recently discovered jaq
and while trying to run some of my jq
based scripts noticed that walk(f)
didn't seem to be implemented.
Is this something that is planned to be implemented at a later date or is there an alternative method/function already available to do the same thing?
I have a JSON file contains 138 instances records, which grabbed from EC2 describe-instances
❯ jq 'length' ../tmp/2.json
138
It's slightly different from the example, just flatten with accountId
:
{
"accountId": "1234",
"instance": {
"AmiLaunchIndex": 0,
"ImageId": "ami-0abcdef1234567890",
...,
"Tags": [
{
"Key": "domain_name",
"Value": "foo.bar.com"
},
{
"Key": "git_info",
"Value": "V2.8.7.01-123-1111111"
},
{
"Key": "RebootSetting",
"Value": "[{\"Zone\": \"NZ\", \"Default\": {\"MF\": \"7-22\", \"SS\": \"0-0\"}}]"
},
{
"Key": "os_version",
"Value": "20.04"
},
{
"Key": "region",
"Value": "nz"
},
{
"Key": "customer",
"Value": "bar_group"
},
{
"Key": "environment",
"Value": "non-production"
},
{
"Key": "rds",
"Value": "xyz.rds.amazonaws.com"
},
{
"Key": "Name",
"Value": "FOO-BAR"
},
{
"Key": "aws_account_name",
"Value": "FOO-NonProd"
},
{
"Key": "AutoShutdown",
"Value": "True"
},
{
"Key": "AutoStart",
"Value": "True"
},
{
"Key": "application_version",
"Value": "2.8.7"
},
{
"Key": "Create_Auto_Alarms",
"Value": "2022-04-26 02:46:11.030953"
},
{
"Key": "usage",
"Value": "insurance"
}
],
...
}
}
There is my original jq expression, it's to filter some tags, and convert the tags from { Key: string, Value: string }[]
to objects with camelCase keys:
[
.[]
| select(.instance.Tags != null)
| . as $instance
| .instance | ({
"accountId": $instance.accountId,
"imageId": .ImageId,
"instanceId": .InstanceId,
"instanceType": .InstanceType,
"keyName": .KeyName,
"state": .State.Name,
"tags": (.Tags
| map({
key: (.Key | gsub("_(?<a>[a-z])"; .a|ascii_upcase) | (.[0:1] | ascii_downcase) + .[1:]),
value: .Value
})
| sort_by(.key)
| from_entries
)
})
]
However, it will take 20s to run and result in an error:
So I simplified tags to "tags": (.Tags | map({ (.Key): .Value }) | add)
, the result is still very slow:
I also have a suggestion, since jaq is a clone of jq. With jq
I can do
jq -f ./my_filter.jq ./data.json
With jaq I have to
jaq $(cat ./my_filter.jq) < ./data.json
And the jq filter file can not contain any special chars like new-line, as it's part of the command line argument. It's so inconvenient to use.
Description:
I found a use case where jaq is slower:
Commands:
jaq .features[10000].properties.LOT_NUM < citylots.json
jq -cM .features[10000].properties.LOT_NUM < citylots.json
Right now if I run jq -i '.some = "attribute"' some.json
. The resulting file has no format. Personally I think it should format by default.
congratulations on 0.10!
I noticed that there’s a superfluous line (0) in the documentation of foreach
under the example:
seq 1000 | jaq -n 'foreach inputs as $x (0; . + $x)'
In jq
function declaration, the last wins in duplicate name. But in jaq
, the first wins.
$ jq -n 'def f(g;g): g; f(1;2)'
2
$ jaq -n 'def f(g;g): g; f(1;2)'
1
This seems to be a problem in ariadne:
$ jaq ''
Error: Unexpected end of input while parsing value, expected if, reduce, -, {, (, def, ., [
thread 'main' panicked at 'index out of bounds: the len is 0 but the index is 0', /home/mfaerber/.cargo/registry/src/github.com-1ecc6299db9ec823/ariadne-0.1.5/src/source.rs:109:25
The jaq README currently states:
Therefore, unlike jq, jaq satisfies the following paragraph in the jq manual:
An important point about the identity filter is that it guarantees to preserve the literal decimal representation of values. This is particularly important when dealing with numbers which can't be losslessly converted to an IEEE754 double precision representation.
Please note that the given link actually refers to the "development version" of jq. Thus, although your underlying point is fair criticism of jq 1.6 and earlier, the text quoted above really should be revised, ideally to make it clear that the "master" version of jq does indeed deal with literals losslessly, and at least to make it clear that your critique only applies to version 1.6 and earlier.
I believe that no claim was ever authoritatively made about losslessness in jq 1.6 or earlier.
Thanks!
jq has the option --exit-status
which reflects the result of filtering to the exit code.
Excerpt from jq manual:
* `-e` / `--exit-status`:
Sets the exit status of jq to 0 if the last output values was
neither `false` nor `null`, 1 if the last output value was
either `false` or `null`, or 4 if no valid result was ever
produced. Normally jq exits with 2 if there was any usage
problem or system error, 3 if there was a jq program compile
error, or 0 if the jq program ran.
Another way to set the exit status is with the `halt_error`
builtin function.
Without filter jq
just print the json.
$ jq -n ' "01" | tonumber'
1
$ gojq -n ' "01" | tonumber'
1
$ jaq -n ' "01" | tonumber'
Error: cannot parse "01" as JSON: invalid number at line 1 column 2
I realize the JSON specification does not permit leading 0s in numbers, but it seems to me that that is irrelevant here.
Thanks.
The jq manual states:
jq has a few operators of the form
a op= b
, which are all equivalent toa |= . op b
.
However, I believe that this is actually not true in jq. Proof:
$ jq -cn '{x: 1, y: 2} | .x += .y'
{"x":3,"y":2}
$ jq -cn '{x: 1, y: 2} | .x |= . + .y'
jq: error (at <unknown>): Cannot index number with string "y"
What is the intended semantics of +=
, -=
then? Should the jq manual be updated to reflect this behaviour?
@itchyny, @pkoppstein, do you have any idea about this?
(I discovered this issue while porting @itchyny's Brainfuck interpreter to jaq, which involved changing
.output += [.memory[.pointer]]
to .memory[.pointer] as $m | .output += [$m]
due to this issue.)
The filter
jaq -n 'def trees: recurse([., .]); 0 | nth(16; trees) | flatten | length'
gives a stack overflow. Smaller values for nth (i.e. 14) do not produce an overflow.
It might help to debug this with cargo flamegraph
to see what causes the recursive calls.
When the array is empty, from_entries returns null instead of an empty object:
$ jaq "from_entries" <<(echo "[]")
null
This means it is not a true inverse of to_entries
which can be problematic if you're not sure whether the object will have any keys, but you need to ensure the output remains an object
$ jaq "to_entries | from_entries" <<(echo "{}")
null
A cold run of cargo build --release
takes about three minutes.
A lot of this time is spent building jaq-parse
.
Luckily, cargo build
(without --release
) is much faster, but still, it would be nice to have shorter build times in release mode.
I suspect that the types in jaq-parse
getting too large is the culprit. It might help to insert boxed()
calls here and there to remedy the problem, but the last time I tried that, I got lifetime errors.
Somehow flatten
of jaq
yields the result in unexpected order.
$ jq -nc '[[[0], 1], 2, [3, [4]]] | flatten'
[0,1,2,3,4]
$ jaq -nc '[[[0], 1], 2, [3, [4]]] | flatten'
[2,1,3,0,4]
In Folding (reduce .[] as $x (0, . + $x), foreach .[] as $x (0, . + $x))
the commas should be semicolons.
Also, it would perhaps be less confusing if the basic "thee-argument" form of foreach
were illustrated, e.g.
foreach .[] as $x (0; . + $x; 2 * .)
Due to the difference of iterating Cartesian product, currently jaq
yields different order.
$ jq -n '(1,2) * (3,4)'
3
6
4
8
$ jaq -n '(1,2) * (3,4)'
3
4
6
8
It would be awesome if this were a WebAssembly library wrapped in JS on npm.
just curious, any reason not to tag the release commits? Thanks!
Using jaq
, split/1
with empty string yields an array with empty strings at the both ends.
$ jq -n '"abc" | split("")'
[
"a",
"b",
"c"
]
$ jaq -n '"abc" | split("")'
[
"",
"a",
"b",
"c",
""
]
The jaq vs jq numbers in the Performance section of the README are impressive, but I suspect that your jq numbers are based on a version of jq that has been compiled with assertion-checking turned on, and so am wondering whether you would agree that that would be worth mentioning. Also, since it's been some time (years) since the last official release of jq (v 1.6), some people might also be interested to know that the "master" version is often significantly faster.
It's easy to create a version of jq with NDEBUG in effect: basically, one has only to add -DNDEBUG to DEFS in Makefile
.
Here are some illustrative "u+s" timings that I obtained on a 3 GHz machine for running CMD -n empty
128 times:
jaq: 0.49s
jq-1.6 3.96s
jqMaster 0.65s
jqMaster.NDEBUG 0.62s
Would you be interested in any other comparisons?
This works:
$ echo '"\""' | jq '.'
"\""
$ jq -n '"\""'
"\""
Here lies the problem:
$ echo '"\""' | jaq '.'
"\""
$ `jaq -n '"\""'`
Error: <parse error ...>
jaq should parse JSON strings inside filters like jq.
I was reading the release notes for 0.9.0 and was quite disappointed to read about some of the changes affectingforeach
in the BREAKING CHANGE notes.
Obviously breaking changes can often be justified, and discrepancies from jq are to be expected, but some of the changes affecting foreach
are, it seems to me, very hard to justify given (a) their utility,(b) the ordinary meaning of "for each" in English.; and (c) the fact that gojq also conforms with the jq semantics.
Since 0.9.0 < 1.0, I am hoping that you will reconsider at least some aspects of foreach
, e.g. its name.
For example, since you evidently feel strongly about the init
value being emitted, one possibility would be for jaq to have the control structure you want under a different name. Let's suppose it was named for
. Then (ideally perhaps) we could have our cake and eat it (i.e. have for
and foreach
), but if you do not wish to have both, then having your control structure as for
would at least help avoid confusion.
By the way, congratulations on all the improvements in 0.9.0!
With --slurp
option, jq reads all the files and collects to an array. Currently jaq
processes each files individually.
$ jq --slurp . <(echo null) <(echo false) <(echo true)
[
null,
false,
true
]
$ jaq --slurp . <(echo null) <(echo false) <(echo true)
[
null
]
[
false
]
[
true
]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.