Giter VIP home page Giter VIP logo

jqjq's Introduction

jqjq

jq implementation of jq

Warning this project is mostly for learning, experimenting and fun.

Why? It started when researching how to write decoders in jq for fq which ended up involving some syntax tree rewriting and walking and then it grew from that.

But it's also a great way to show that jq is a very expressive, capable and neat language!

You can try and play around with jqjq using jqplay.org.

Use via jqjq wrapper

$ ./jqjq -n 'def f: 1,8; [f,f] | map(.+105) | implode'
"jqjq"

$ ./jqjq '.+. | map(.+105) | implode' <<< '[1,8]'
"jqjq"

# use jqjq via jqjq to run above example
# eval the concatenation of jqjq.jq as a string and the example
$ ./jqjq "eval($(jq -Rs . jqjq.jq)+.)" <<< '"eval(\"def f: 1,8; [f,f] | map(.+105) | implode\")"'
"jqjq"

# jqjq have a REPL
$ ./jqjq --repl
> 1,2,3 | .*2
2
4
6
> "jqjq" | explode | map(.-32) | implode
"JQJQ"
> "jqjq" | [eval("explode[] | .-32")] | implode
"JQJQ"
> ^D

# run 01mf02's adaptation of itchyny's bf.jq running fib.bf
$ ./jqjq -n "\"$(cat fib.bf)\" | $(cat bf.jq)"
"1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233"

$ ./jqjq -h
jqjq - jq implementation of jq
Usage: jqjq [OPTIONS] [--] [EXPR]
  --jq PATH        jq implementation to run with
  --lex            Lex EXPR
  --no-builtins    No builtins
  --null-input,-n  Null input
  --parse          Lex and parse EXPR
  --repl           REPL
  --run-tests      Run jq tests from stdin
  --slurp,-s       Slurp inputs into an array

Use with jq

$ jq -n -L . 'include "jqjq"; eval("def f: 1,8; [f,f] | map(.+105) | implode")'
"jqjq"

$ jq -L . 'include "jqjq"; eval("(.+.) | map(.+105) | implode")' <<< '[1,8]'
"jqjq"

Run tests

make test

Note that the tests are meant to be used with jq 1.7.

Progress

  • 123, .123, 1.23, 1.23e2, 1.23e+2, "abc", true, false, null Scalar literals
    • Unicode codepoint escape "\ud83d\ude03"
    • Handle surrogate pairs \ud800-\udfff, should translate to codepoint.
    • Control code and quote escape "\"\n\r\t\f\b\\\/"
  • {key: "value"} Object literal
    • {key}
    • {"key"}
    • {$key}
    • {(f): f}
    • {("a","b"): (1,2), c: 2} Multiple key/value outputs
    • {"\(f)"} String interpolation
    • {key: 1 | .} Multi value queries
  • [1,2,3] Array literal, collect
  • 1, 2 Comma operator
  • 1 | 2 Pipe operator
  • +, -, *, /, % Arithmetic operators
  • +123, -1 Unary operators
  • ==, !=, <, <=, >, >= Comparison operators
  • 123 as $a | ... Binding
    • (1,2,3) as $a | ... Binding per output
    • {a: [123]} as {a: [$v]} Destructuring binding
  • . Identity
  • .a, ."a", .[1], .[f] Index
  • .key[123]."key"[f] Suffix expressions
    • .a.b Multi index
    • .a.b? Optional index
    • .a[] Iterate index
    • .[]? Try iterate
  • .[] Iterate
  • .[start:stop], .[:stop], .[start:] Array slicing
    • .[{start: 123, stop: 123}] Slice using object
    • Slice and path tracking path(.[1:2]) -> [{"start":1,"end":2}]
  • try f, Shorthand for try f catch empty
  • f? Shorthand for try f catch empty
  • and, or operators
  • not operator
  • if f then 2 else 3 end Conditional
    • if f then 2 end Optional else
    • if f then 2 elif f then 3 end Else if clauses
    • if true,false then "a" else "b" end Multiple condition outputs
  • reduce f as $a (init; update) Reduce outputs from f into one output
  • foreach f as $a (init; update; extract) Foreach outputs of f update state and output extracted value
    • Optional extract
  • f = v Assignment
  • f |= v, f += Update assignment
  • +=, -=, *=, /=, %= Arithmetic update assignment
  • eval($expr) (jqjq specific)
  • path(f) Output paths for f
  • input, inputs
  • Builtins / standard library
    • add
    • all, all(cond), all(gen; cond)
    • any, any(cond), any(gen; cond)
    • bsearch($target)
    • capture($val), capture(re; mods)
    • debug (passthrough)
    • debug(msgs)
    • del(f)
    • delpaths($paths) (passthrough)
    • empty (passthrough)
    • endswith($s)
    • error($v) (passthrough)
    • error (passthrough)
    • explode (passthrough)
    • first(f)
    • first
    • flatten, flatten($depth)
    • from_entries
    • fromjson
    • getpath(path) (passthrough)
    • group, group_by(f)
    • gsub($regex; f) (passthrough)
    • gsub($regex; f; $flags)
    • halt_error, halt_error($exit_code)
    • has($key) (passthrough)
    • implode (passthrough)
    • in(xs)
    • index($i)
    • indices($i)
    • isempty
    • join($s)
    • last(f)
    • last
    • length (passthrough)
    • limit($n; f)
    • map(f)
    • match($val)
    • match($regex; $flags) (passthrough)
    • max, max_by(f)
    • min, min_by(f)
    • nth($n; f); nth($n)
    • range($to), range($from; $to), range($from; $to; $by)
    • recurse, recurse(f)
    • repeat
    • reverse
    • rindex($i)
    • scalars
    • select(f)
    • setpath (passthrough)
    • sort, sort_by(f)
    • split($s)
    • split($re; flags)
    • splits($re), splits($re; flags)
    • startswith($s)
    • test($val)
    • test($regex; $flags) (passthrough)
    • to_entries
    • tojson
    • tonumber (passthrough)
    • tostring (passthrough)
    • transpose
    • type (passthrough)
    • unique, unique_by(f)
    • until(cond; next)
    • while(cond; update)
    • with_entries
    • Math functions, sin/0, ... atan/2, ...
    • More...
  • def f: . Function declaration
    • def f(lambda): lambda Lambda argument
    • (def f: 123; f) | . Closure function
    • def f: def _f: 123; _f; f Local function
    • def f($binding): $binding Binding arguments
    • def f: f; Recursion
  • .. Recurse input, same as recurse
  • // Alternative operator
  • ?// Alternative destructuring operator
  • $ENV
  • @format "string" Format string
  • label $out | break $out Break out
  • include "f", import "f" Include
  • Run jqjq with jqjq
  • Bugs

jq's test suite

$ ./jqjq --run-tests < ../jq/tests/jq.test | grep passed
297 of 442 tests passed

Note that expected test values are based on stedolan's jq. If you run with a different jq implementation like gojq some tests might fail because of different error messages, support for arbitrary precision integers etc.

Design overview

jqjq has the common lex, parse, eval design.

Lex

Lexer gets a string and chews off parts from left to right producing an array of tokens [{<name>: ...}, ...]. Each chew is done by testing regex:s in a priority order to make sure to match longer prefixes first, ex: += is matched before +. For a match a lambda is evaluated, usually just . (identity), but in some cases like for quoted strings it is a bit more complicated.

You can use ./jqjq --lex '...' to lex and see the tokens.

Parse

Parser takes an array of tokens and uses a left-to-right (LR) parser with backtracking in combination with precedence climbing for infix operators to not end up in an infinite loop (ex parser rule E -> E + E). Backtracking is done by outputting empty for non-match and // to try the next rule, ex: a // b // ... // error where a and b are functions that try to match a rule. When a rule has matched it returns an array with the pair [<tokens left>, <ast>]. <ast> uses the same AST design as gojq.

You can use ./jqjq --parse '...' to lex and parse and see the AST tree.

Eval

Eval is done by traversing the AST tree and evaluates each AST node and also keeps track of the current path and environment.

Path is used in jq to keep track of current path to where you are in the input, this only works for simple indexing (ex: path(.a[1]), .b outputs ["a",1] and ["b"]). This is also used to implement assignment and some other operators.

Environment is an object with current functions and bindings. Functions have the key name <name>/<arity> and the value is a function AST. Bindings use the key name $<name>/0 and the value is {value: <value>} where value is normal jq value.

When evaluating the AST eval function get the current AST node, path and environment and will output zero, one or more arrays with the pair [<path>, <value>]. Path can be [null] if the evaluation produced a "new" value etc so that path tracking is not possible.

Problems, issues and unknowns

  • Better error messages.
  • The "environment" pass around is not very efficient and also it makes support recursion a bit awkward (called function is injected in the env at call time).
  • "," operator in jq (and gojq) is left associate but for the way jqjq parses it creates the correct parse tree when it's right associate. Don't know why.
  • Suffix with multiple [] outputs values in wrong order.
  • Non-associate operators like == should fail, ex: 1 == 2 == 3.
  • Object are parsed differently compared to gojq. gojq has a list of pipe queries, jqjq will only have one that might be pipe op.
  • Less "passthrough" piggyback on jq features:
    • reduce/foreach via recursive function? similar to if or {}-literal?
    • try/catch via some backtrack return value? change [path, value] to include an error somehow?
  • How to support label/break?
  • How to support delpaths (usd by del etc). Have to keep paths the same while deleting a group of paths? use sentinel value? work with paths instead?
  • Rewrite AST before eval, currently if and some others do rewrite (optional parts etc) while evaluating.
  • Rethink invalid path handling, current [null] is used as sentinel value.
  • {a:123} | .a |= empty should remove the key.

Useful references

Tools and tricks

  • jq -n --debug-dump-disasm '...' show jq byte code
  • jq -n --debug-trace=all '...' show jq byte code run trace
  • jq -n '{a: "hello"} | debug' 2> >(jq -R 'gsub("\u001b\\[.*?m";"") | fromjson' >&2) pretty print debug messages
  • GOJQ_DEBUG=1 go run -tags gojq_debug cmd/gojq/main.go -n '...' run gojq in debug mode
  • fq -n '".a.b" | _query_fromstring' gojq parse tree for string
  • fq -n '{...} | _query_tostring' jq expression string for gojq parse tree
  • For a convenient jq development experience:

Thanks to

  • stedolan for jq and got me interested in generator/backtracking based languages.
  • pkoppstein for writing about jq and PEG parsing.
  • itchyny for jqjq fixes and gojq from which is learned a lot and is also from where most of jqjq's AST design comes from. Sharing AST design made it easier to compare parser output (ex via fq's _query_fromstring). gojq also fixes some confusing jq bugs and has better error messages which saves a lot of time.
  • Michael Färber @01mf02 for jaq and where I also learned about precedence climbing.

License

Copyright (c) 2022 Mattias Wadman

jqjq is distributed under the terms of the MIT License.

See the LICENSE file for license details.

jqjq's People

Contributors

itchyny avatar orhun avatar thaliaarchi avatar wader avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

jqjq's Issues

for a jq code beautifier

Hello wader,

I hope your are fine.
I discovered your jqjq project late.
I also made my own POSIX Shell wrapper with jq for jq but for another goal with another design (mainly 1 jq to process the code 1 jq to evaluate it).
I've been dreaming of a jq lexer/parser for a while, you did it!

Now I think about a side : a jq code beautifier !

With your lex and parse functions, I get the AST.
I need a function to convert back the AST to string.

  • This "AST to string" does not exist in jqjq, isn't it?

I plan to write this "AST_to_string" function, but I'd rather ask you if you haven't already done it?

Regards,

Intrinsics take precedence over user-overloaded filters

Filters, that are passed through to jq, take precedence over user-defined filters of the same signature. This does not match jq behavior. Here's an example comparing debug, which is an intrinsic, and first, which is a builtin.

$ jqjq -n 'def debug: 42; debug'
["DEBUG:",null]
null
$ jq -n 'def debug: 42; debug'
42
$ jqjq -n 'def first: 42; first'
42
$ jq -n 'def first: 42; first'
42

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.