Giter VIP home page Giter VIP logo

Comments (20)

sadjad avatar sadjad commented on July 22, 2024 2

Hi @drunksaint,

Let me try this out first, and will get back to you in a couple hours.

Best,
Sadjad

from gg.

sadjad avatar sadjad commented on July 22, 2024 2

It looks like there's a bug in model-generic. Look at the function.args above; it totally omitted --inputfile and also didn't convert i.txt to @{GGHASH:}. I'm gonna take a look at it and fix the issue.

from gg.

drunksaint avatar drunksaint commented on July 22, 2024

@sadjad let me know if i can help with anything

from gg.

sadjad avatar sadjad commented on July 22, 2024

Okay, I managed to reproduce the error with a simple script, and I'm getting the exact same message.

The problem

The PyInstaller bootstrapping function, tries to open and read the binary itself. From the looks of it, it takes argv[0] as the path for the binary, but that's non-existent (the actual binary is located in .gg/blobs/BINARY_HASH).

That's the error message you're getting: /tmp/thunk-execute.FANRSb/argtest. It tries to find argtest in the current directory.

I can think of a few solutions:

Solution 1

You can instruct gg to create a link to the binary in the execution directory. However, it's a new feature and currently only available through gg create-thunk, using --link option. For example, in your case, after creating your binary you can create your thunk like this:

gg create-thunk \
    --value $(gg hash input.txt) \
    --output output.txt \
    --executable $(gg hash argtest) \
    --placeholder output.txt \
    --link input.txt=$(gg hash input.txt) \
    --link argtest=$(gg hash argtest) \
    $(gg hash argtest)
    argtest input.txt output.txt

Two links are created: one a link to input.txt and one link to argtest. This simplifies the application, since they can refer to files using those names, and PyInstaller would be happy...

Solution 2

Of course, doing all that is not the most convenient way to create thunks. You can also change modes/model-generic.cc and add a option to tell it to include the link to the executable... (I can help with this, if you wanna go down this road).

Solution 3

Take a look at Nuitka, it's a Python compiler that's faster and makes smaller binaries than PyInstaller. I tried it with my simple script, and it works out of the box with gg (I haven't used this in real life before, it just looked promising!).

Please let me know if any of these helps!

Best,
Sadjad

from gg.

drunksaint avatar drunksaint commented on July 22, 2024

Thanks for your help looking at this @sadjad. Both solution 1 and 3 worked for positional arguments! Solution 2 may not be required yet. I'll go down this road if I need to later. I had tried cython earlier but this was causing problems compiling larger libraries. That's why I started looking at pyinstaller. Nuitka seems to work great for larger libraries as well though. Thanks for this suggestion!

I expanded the test python file argtest.py to use a more complex combination of positional and optional arguments and this caused failure using both solution 1 & 3. In solution 1, gg was trying to read the optional arguments itself (gg-create-thunk: unrecognized option '--inputfile=i.txt') and in solution 3, i think my wrapper function is incorrect.

the command i used (i.txt and j.txt are input files whose number of lines are read):

python argtest.py 34 i.txt o.txt --arg 45 --inputfile j.txt --outputfile p.txt

my wrapper file:

#!/bin/bash
model-generic "/path/to/argtest @ @infile @outfile --inputfile=@infile --outputfile=@outfile" "$@"
  • is there something wrong with my wrapper file?
  • does gg create-thunk accept commands that use optional arguments?
  • is there some documentation on how to create wrappers?

from gg.

sadjad avatar sadjad commented on July 22, 2024

Hello there,

Glad it worked!

is there something wrong with my wrapper file?

I think the only thing that it's missing the --arg option. You need to tell model-generic about the non-file options as well, so it can parse the whole command correctly. For example, in this case, you need to add --arg=@ to the description.

does gg create-thunk accept commands that use optional arguments?

Yes, it does, but you need to tell it explicitly where the create-thunk options ends and your arguments begin; by passing -- right before passing the positional arguments:

gg create-thunk \
    --value $(gg hash input.txt) \
    ...
    --output output.txt \
    -- $(gg hash binary) argtest input.txt output.txt --any-option-you-like test

is there some documentation on how to create wrappers?

Sadly no. There are a few examples in here, and frankly, that's really all that's supported by model-generic.

from gg.

drunksaint avatar drunksaint commented on July 22, 2024

I tried adding --arg=@. But the gg model creation gives an error.
My wrapper file:

#!/bin/bash
model-generic "/path/to/argtest @ @infile @outfile --arg=@ --inputfile=@infile --outputfile=@outfile" "$@"

The error I get:

$ gg infer argtest 34 i.txt o.txt --arg 45 --inputfile j.txt --outputfile p.txt
terminate called after throwing an instance of 'std::runtime_error'
  what():  unexpected token in description
/path/to/argtest: line 2:  3516 Aborted                 (core dumped) model-generic "/path/to/argtest @ @infile @outfile --arg=@ --inputfile=@infile --outputfile=@outfile" "$@"
$ gg infer argtest 34 i.txt o.txt --arg=45 --inputfile=j.txt --outputfile=p.txt
terminate called after throwing an instance of 'std::runtime_error'
  what():  unexpected token in description
/path/to/argtest: line 2:  3531 Aborted                 (core dumped) model-generic "/path/to/argtest @ @infile @outfile --arg=@ --inputfile=@infile --outputfile=@outfile" "$@"

I can add a PR for simple documentation to use a custom binary with gg (create wrapper file, python to binary) if that helps.

from gg.

sadjad avatar sadjad commented on July 22, 2024

The issue was that we didn't have support for non-file positional arguments (the first @ in your arguments). I just pushed a commit that should fix that problem.

I can add a PR for simple documentation to use a custom binary with gg (create wrapper file, python to binary) if that helps.

That would be amazing. Thank you!

from gg.

drunksaint avatar drunksaint commented on July 22, 2024

Nice! now the thunk creation goes through. But gg force fails:

$ gg infer argtest 65 i.txt o.txt --arg=23 --inputfile=j.txt --outputfile=p.txt
$ gg force o.txt 
→ Loading the thunks...  done (0 ms).
usage: argtest [-h] [--arg ARG] [--inputfile INPUTFILE]
               [--outputfile OUTPUTFILE]
               posarg posinfile posoutfile
argtest: error: argument posinfile: can't open 'i.txt': [Errno 2] No such file or directory: 'i.txt'
std::exception
 `TsCTUcZt2lNy.X4aqO5fZapqL5rQt219d.TVJVfDlcaA0000016d': process exited with failure status 2
gg-force: `TsCTUcZt2lNy.X4aqO5fZapqL5rQt219d.TVJVfDlcaA0000016d': process exited with failure status 5

the python file if it helps. I created the binary using

python -m nuitka --follow-imports argtest.py -o argtest

from gg.

sadjad avatar sadjad commented on July 22, 2024

Could you please run gg describe TsCTUcZt2lNy.X4aqO5fZapqL5rQt219d.TVJVfDlcaA0000016d and post the output here?

from gg.

drunksaint avatar drunksaint commented on July 22, 2024
$ gg describe TsCTUcZt2lNy.X4aqO5fZapqL5rQt219d.TVJVfDlcaA0000016d
{
 "function": {
  "hash": "VwndxS_gxNE2mcwtSLrx9tEXMvi75zyQaPl5DAZEY8PA00068380",
  "args": [
   "argtest",
   "65",
   "i.txt",
   "o.txt",
   "--arg=23",
   "@{GGHASH:VziaXgzsiNzeCIBBjDSZ9oHqywVIPYTBP5.ksiphPP_000000016}",
   "--outputfile=p.txt"
  ],
  "envars": []
 },
 "values": [
  "VQJaFeszdSCpcqZ.8IO313LxfNQhtfxIAF7wcf7U2nZc0000001c",
  "VziaXgzsiNzeCIBBjDSZ9oHqywVIPYTBP5.ksiphPP_000000016"
 ],
 "thunks": [],
 "executables": [
  "VwndxS_gxNE2mcwtSLrx9tEXMvi75zyQaPl5DAZEY8PA00068380"
 ],
 "outputs": [
  "p.txt",
  "o.txt"
 ],
 "links": [],
 "timeout": 0
}

$ cat o.txt 
#!/usr/bin/env gg-force-and-run
TsCTUcZt2lNy.X4aqO5fZapqL5rQt219d.TVJVfDlcaA0000016d#o.txt
$ cat p.txt 
#!/usr/bin/env gg-force-and-run
TsCTUcZt2lNy.X4aqO5fZapqL5rQt219d.TVJVfDlcaA0000016d

from gg.

sadjad avatar sadjad commented on July 22, 2024

I just pushed a commit that hopefully fixes the issue!

When I was looking at model-generic implementation, I remembered how narrow the implementation was. It should be fine for now, but, for example, if instead of --inputfile A you pass --inputfile=A, it would not work. I'm motivated to redo the implementation to include support for all POSIX-style options, but that'll take some time :)

Please let me know if this fixes your problem.

Thank you!

from gg.

drunksaint avatar drunksaint commented on July 22, 2024

@sadjad that would really help using gg with custom commands! :).

Your changes + replacing --inputfile=A with --inputfile A works perfectly!! Thanks for the fixes!

I tried 2 other things:

  • boolean flags don't seem to work right now (command --flag). I can see the problem here. Maybe something needs to be added in generic.cc as well, but I'm not sure.
  • i tried seeing if gg could be made to use the redirection operator > by substituting it with @ in the wrapper file. Seems like that doesn't work as expected.
$ helloworld > o.txt

associated wrapper file:

#!/bin/bash
model-generic "/path/to/helloworld @ @outfile" "$@"

model inference error:

$ gg infer helloworld > o.txt 
terminate called after throwing an instance of 'std::runtime_error'
  what():  missing positional argument
/path/to/helloworld: line 2: 19498 Aborted                 (core dumped) model-generic "/path/to/helloworld @ @outfile" "$@"

The error message seems to be related to your latest commit, so i thought it might be relevant.

I really appreciate your help with everything here. Thanks!

UPDATE: just realized that the shell is removing everything from the redirection operator. not sure what the best way to do this is.

from gg.

sadjad avatar sadjad commented on July 22, 2024

Awesome!

boolean flags don't seem to work right now (command --flag). I can see the problem here. Maybe something needs to be added in generic.cc as well, but I'm not sure.

You should not include boolean flags in the description---only options with a required argument are necessary.

i tried seeing if gg could be made to use the redirection operator > by substituting it with @ in the wrapper file. Seems like that doesn't work as expected.

The redirection operator is handled by the shell itself and is never passed to the program. So, in case of helloworld > o.txt, shell runs helloworld and writes its stdout to o.txt. The contract in a gg thunk is that it writes its output to a file, and then that file is grabbed by gg. Currently, there's no mechanism to directly tell gg to grab the stdout.

However, there's a trick you can play. You can wrap the command you wanna run in another script. For example:

#!/bin/sh

helloworld >o.txt

Then, create a thunk for this script, which writes its output to o.txt!

A year ago, I was trying to make gg work for simple command line programs like cat and grep that write their output to stdout, by creating a generic wrapper (iowrap). It was abandoned since, but feel free to take a look: https://github.com/sadjad/ggsh

from gg.

drunksaint avatar drunksaint commented on July 22, 2024

You should not include boolean flags in the description---only options with a required argument are necessary.

Nice, this works! I've added this with our whole discussion to the documentation in this pull request

You can wrap the command you wanna run in another script.

Sounds good. I'll try this.

A year ago, I was trying to make gg work for simple command line programs like cat and grep that write their output to stdout, by creating a generic wrapper (iowrap). It was abandoned since, but feel free to take a look: https://github.com/sadjad/ggsh

This is neat! much better than having to write wrapper commands for all scripts. I tried running it but wasn't sure how to add iowrap as a thunk. I added the files from ggsh/models to gg/src/models/wrappers and kept ggsh/iowrap in the current directory that i ran the commands from. looks like gg didn't detect the iowrap thunk or something.

$ gg infer cat i.txt
TJcHES0HLwnIqnEgUVbrAvgj2r6aKDPoh9IcGzfM9fbs00000117

$ gg describe TJcHES0HLwnIqnEgUVbrAvgj2r6aKDPoh9IcGzfM9fbs00000117
{
 "function": {
  "hash": "VkIXLi2AvcdLUIbAIYdr4IfjH5c.ikp.MZ4QNEELTWPY00000133",
  "args": [
   "iowrap",
   "-",
   "out",
   "cat",
   "@{GGHASH:VziaXgzsiNzeCIBBjDSZ9oHqywVIPYTBP5.ksiphPP_000000016}"
  ],
  "envars": []
 },
 "values": [
  "VziaXgzsiNzeCIBBjDSZ9oHqywVIPYTBP5.ksiphPP_000000016=i.txt"
 ],
 "thunks": [],
 "executables": [
  "VkIXLi2AvcdLUIbAIYdr4IfjH5c.ikp.MZ4QNEELTWPY00000133=iowrap"
 ],
 "outputs": [
  "out"
 ],
 "links": [],
 "timeout": 0
}

$ gg force out 
→ Loading the thunks...  done (0 ms).
TJcHES0HLwnIqnEgUVbrAvgj2r6aKDPoh9IcGzfM9fbs00000117: execvpe failed
std::exception
 `TJcHES0HLwnIqnEgUVbrAvgj2r6aKDPoh9IcGzfM9fbs00000117': process exited with failure status 1
gg-force: `TJcHES0HLwnIqnEgUVbrAvgj2r6aKDPoh9IcGzfM9fbs00000117': process exited with failure status 5

$ gg create-thunk --value $(gg hash iowrap) --executable $(gg hash iowrap) $(gg hash iowrap) iowrap
gg-create-thunk: a thunk needs at least one output

cat especially helps with the linking step for custom commands.
Some help with how to set this up will be great. Thanks!

from gg.

sadjad avatar sadjad commented on July 22, 2024

Nice, this works! I've added this with our whole discussion to the documentation in this pull request

Thank you for the pull request! I just had a peek and it looks great. Will merge it as soon as possible.

I tried running it but wasn't sure how to add iowrap as a thunk.

You're almost there! You need to collect the iowrap file. From the directory of your program, run gg collect /path/to/iowrap to make a copy in .gg/blobs directory. Also you may need to collect your input file (i.txt) manually as well (these should be easy to fix).

The nice part is that you can pipe these commands together. For example, you can run:

gg infer sh -c 'cat i.txt | grep hello'

And it will work. (as far as I remember!)

(Unfortunately, gg infer cat i.txt | grep hello would not work. But imagine if instead of bash, there's a gg shell that understands these commands and takes care of things without having to explicitly type gg infer. That was the ultimate idea behind this gsh thing...)

from gg.

drunksaint avatar drunksaint commented on July 22, 2024

You're almost there! You need to collect the iowrap file. From the directory of your program, run gg collect /path/to/iowrap to make a copy in .gg/blobs directory. Also you may need to collect your input file (i.txt) manually as well (these should be easy to fix).

Nice, it works with this fix! Piping works too! Thanks! But I'm not sure I'll be able to use it since the modeled cat looks like it works with only one input file. I'm not sure it is possible to send an unknown number of input files to a command. If I have to use cat to perform the final linking step, It can be done locally if that is the case.

I'm trying to parallelize a simple script. To do this, I'm splitting a file into small pieces and trying to create an output for each piece in an output directory. Outputs to the current directory work fine, but outputs to a subdirectory give an error:

$ mkdir outputdir
$ gg infer fileoutputtest outputdir/out.txt
$ gg force outputdir/out.txt 
→ Loading the thunks...  done (0 ms).
Issue in opening the Output file
std::exception
 `TmcJUtXkVfu6qE5vMqOPpVmKnO3RWSTvoHp66MaHCvPU0000009f': process died on signal 11
gg-force: `TmcJUtXkVfu6qE5vMqOPpVmKnO3RWSTvoHp66MaHCvPU0000009f': process exited with failure status 5

$ gg describe TmcJUtXkVfu6qE5vMqOPpVmKnO3RWSTvoHp66MaHCvPU0000009f
{
 "function": {
  "hash": "VGbXEAZKy6aaAzFPLtIR0m1JOnTchAJ2vw_7UJLiVe1s000020f8",
  "args": [
   "fileoutputtest",
   "o/out.txt"
  ],
  "envars": []
 },
 "values": [],
 "thunks": [],
 "executables": [
  "VGbXEAZKy6aaAzFPLtIR0m1JOnTchAJ2vw_7UJLiVe1s000020f8"
 ],
 "outputs": [
  "o/out.txt"
 ],
 "links": [],
 "timeout": 0
}

Seems like the issue is that the directory outputdir doesn't exist in the execution context. Looks like inputs can have directories since they are referred to by their hash but outputs cannot since there is no implicit directory creation in the execution context. Am I thinking about this the right way? Or is there some other way to create output files in a subdirectory?

from gg.

sadjad avatar sadjad commented on July 22, 2024

You're right about this. Currently the system doesn't create the output directory automatically. Although, I think you can try creating the o/ directory in your script, and then put the output file there.

from gg.

sadjad avatar sadjad commented on July 22, 2024

I'm not sure it is possible to send an unknown number of input files to a command.

This should be possible, because at the time of thunk generation, I think we know how many files we have. But I'm not sure if current implementation of iowrap has support for multiple inputs.

from gg.

drunksaint avatar drunksaint commented on July 22, 2024

Ah i see, the gg create-thunk command can be generated dynamically. I've added multiple file support for cat to this pull request.

I think I have a much better understanding of how gg can be used to parallelize a custom workload now. Thanks for your help with everything here! I'll close this issue.

from gg.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.