binpash / libdash Goto Github PK

The dash shell as a linkable library. Tracks https://git.kernel.org/pub/scm/utils/dash/dash.git, with extended interfaces, bindings for Python and OCaml, and tools for generating JSON representations of shell scripts.

License: Other

Makefile 0.54% Shell 36.25% M4 0.63% C 43.32% Roff 8.71% Max 0.38% OCaml 5.49% Python 4.67%

library ocaml parser posix posix-sh posix-shell python shell shell-script

libdash's Issues

Better pretty printing

Bring forward the fix to pretty printing in ac328aa.

Parsing and unparsing has super-linear runtime

Parsing and unparsing have super-linear runtime, largely because OCaml list append and string concatenation are not optimized.

Parsing: parse_tilde quadratic (/cubic?) behavior

for n in 1000 2000 4000 8000 16000 32000 64000; do perl -e "print '~'; print 'a' x $n;" > /tmp/t; time ./parse_to_json.native /tmp/t > /dev/null; done

user    0m0.015s
user    0m0.034s
user    0m0.153s
user    0m0.830s
user    0m4.540s
user    0m27.661s
user    3m4.591s

(real/sys runtime omitted for brevity)

Notes:

This is likely because of ast.ml: | c::s' -> parse_tilde (acc @ [c]) s'. It's not obvious to me why it seems to be cubic (rather than quadratic) runtime though.
- An alternative would be to prepend to the list, and then reverse the result at the end.
The parse_to_json.native test application is from https://github.com/andromeda/pash/tree/main/compiler/parser
To avoid conflating the libdash runtime with the JSON serialization, I disabled the JSON serialization code (print_ast) in the test app

Parsing: to_assign cubic (?) behavior

for n in 1000 2000 4000 8000 16000 32000 64000; do perl -e "print 'f' x $n; print '=pay_respects;'" > /tmp/t1; time ./parse_to_json.native /tmp/t1 > /dev/null; done

user    0m0.011s
user    0m0.021s
user    0m0.098s
user    0m0.571s
user    0m3.749s
user    0m25.097s
user    2m52.969s

This also has an expensive list append operation: ast.ml | C c :: a -> to_assign (v @ [c]) a.

Unparsing: ^ string concatenation considered harmful

OCaml's "^" string operator is not optimized; concatenating n strings one at a time can take O(n^2) runtime (https://discuss.ocaml.org/t/whats-the-fastest-way-to-concatenate-strings/2616/7). This is arguably a compiler issue e.g., CPython optimizes for common cases (https://mail.python.org/pipermail/python-dev/2013-February/124031.html).

ast.ml's unparsing is essentially a series of "^" operations, hence everything is going to have a worst-case runtime that's super-linear.

Here's an example of a long pipeline, showing quadratic runtime for json_to_shell:

for n in 2000 4000 8000 16000 32000 64000 128000; do (echo -n "echo 'Hello '"; for f in `seq 1 $n`; do echo -n "| cat "; done; echo) > /tmp/s1; cat /tmp/s1 | ./parse_to_json.native > /tmp/j1; time ./json_to_shell.native /tmp/j1 | md5sum; done

user    0m0.035s
user    0m0.112s
user    0m0.508s
user    0m1.920s
user    0m13.909s
user    0m52.966s
user    4m26.274s

I also tried removing the Ast.to_string operation in json_to_shell.ml, to show that the JSON deserialization by itself is fast (linear); thus, the quadratic runtime is due to the libdash core.

Unparsing: fresh_marker for heredocs is slow on adversarial inputs

fresh_marker tries to find increasingly long-variants of {EOF, EOFF, EOFFF ..}, until it can find a marker that is not contained in the heredoc. This is slow for adversarial inputs that deliberately use all those markers:

cat <<CTHULHU
EOF
EOFF
EOFFF
EOFFFF
EOFFFFF
CTHULHU

Use shasta to return python AST objects

Describes the last missing part of binpash/pash#671.

Since shasta is now its own library, we need to make the following changes in libdash:

Import shasta
Return proper python ASTs
Remove the printer since it is now part of shasta

Migrating python bindings from PaSh

Copy over bindings and build from PaSh
Rename files to something sensible
Get tests running in CI
Port over OCaml json converters
Port over OCaml vs. Python tests
Create setup.py/pip-based install routine
Migrate PaSh to use setup.py

Proper documentation of AST

It would be good to document the AST clearly/well.

Abandon dynamic linking

It has been tremendously painful working with ctypes: it's been hard to get things to work locally, nevermind in CI. Each OPAM version bump burns a day debugging linking issues.

The solution seems clear to me: abandon dynamic linking.

Chapter 20: Interfacing C with OCaml is the general resource we want.

We should simply write accessor functions for getting each field out of each struct. Each struct is an opaque type. Annoying, but straightforward enough.

It's tempting to use an Abstract_tag, which tells the GC about the pointer, or Custom_tag, which lets us use GC hooks to finalize things, etc. But we're already manipulating the stackmarks appropriately, and we don't want the OCaml GC mucking about in there. So: just return the raw pointer.

Add OPAM testing to CI

CI currently just builds by hand. Would be good to test the OPAM package, too.

Build on macOS produces funny .so files

I had to manually run ln -sf dlldash.so libdash.so to get things to work on my janky 2014 MBP.

AST mishandles for loop arguments

Per binpash/pash#249, AST.ml should treat the iterated arguments to a for loop as args not arg.

Should steal their test case.

Shell AST server

Accept shell scripts as POST, return JSON.

Should be super simple, à la python -m http.server.

Completion of error handling

Would you like to add more error handling for return values from functions like the following?

Unify AST with Smoosh

Right now ast.ml and shim.ml in smoosh are out of sync. It'd be great if there were only one copy of the code that converts dash's data structures to OCaml trees.

Some trickiness: it'd be ideal to not have to port over all of the smoosh runtime terms, just the actual parseable ones. Doing so would add an extra layer of conversion, though, which is probably pretty costly. It should be possible to functorize the dash conversion: given some functions that generate appropriate values of some abstract type, we'll call them in the right order.

Move to binpash organization

Update to upstream
Fix CI scripts to avoid flakiness (??)
Fix ocaml bindings #29

Incorrect handling of single-quoted escape sequences in `ast.ml`

See test/failing/backslash or test/failing/aaaa_single (which comes from @tucak's original bug report's reproducer (mgree/smoosh#5) for the bug resolved in #1).

The problem is, depending on your perspective, in backslash insertion or in E insertion in ocaml/ast.ml. At this point, ast.ml is pretty out of sync with mgree/smoosh's shim.ml. Hopefully upcoming student work on generalizing the parsing interface will give us a simpler model, though we're probably a ways out.

Prompt callbacks

libdash, by default, uses dash expansion prints the PS1 and PS2 in the dash prompt. This behavior is wrong (and possibly dangerous, depending on the use case).

We should make the prompt functions monkeypatchable, allowing users of the library to circumvent dash internals (beyond the alias table) entirely. In an ideal world, we could even compile just the few files necessary.

binpash / libdash Goto Github PK

libdash's Issues

Parsing: parse_tilde quadratic (/cubic?) behavior

Parsing: to_assign cubic (?) behavior

Unparsing: ^ string concatenation considered harmful

Unparsing: fresh_marker for heredocs is slow on adversarial inputs

Recommend Projects

Recommend Topics

Recommend Org