Giter VIP home page Giter VIP logo

Comments (15)

asfaltboy avatar asfaltboy commented on August 22, 2024 2

We do something like this in bash:

  1. First run all processes together, use a temp file to store each command's output
  2. Then, we iterate over each command waiting for its completion and display that command's output.

The side effect of this simple method is that it seemingly "stalls" on the slowest command, returning when they all complete. This means that CMDS array should preferably be sorted fastest to slowest.

Click to expand!
for cmd in "${CMDS[@]}"; do
    stdout="$(mktemp)"
    timer="$(mktemp)"
    { { time $cmd >>"$stdout" 2>&1 ; } >>"$timer" 2>&1 ; } &
    pids+=($!)
    stdouts+=("$stdout")
    timers+=("$timer")
done

for i in ${!CMDS[*]}; do
    if wait "${pids[$i]}"; then
        codes+=(0)
    else
        codes+=(1)
    fi

    if [ "${codes[$i]}" -eq "0" ]; then
        echo -en "${C_GREEN}"
        echo -en "${CMDS[$i]}"
        echo -en "$C_RESET"
        echo -e " ($(cat "${timers[$i]}")s)"
    else
        echo -en "${C_RED}${C_UNDERLINE}"
        echo -en "${CMDS[$i]}"
        echo -e "$C_RESET"
        echo -e "$(cat "${stdouts[$i]}")"
    fi
    echo ""
done

from poethepoet.

jnoortheen avatar jnoortheen commented on August 22, 2024 2

I've actually implemented that suggested solution here using asyncio.subprocess module. It just outputs stdout from commands to sys.stdout, stderr to stderr.

from poethepoet.

sewi-cpan avatar sewi-cpan commented on August 22, 2024 2

+1 for this request

from poethepoet.

nat-n avatar nat-n commented on August 22, 2024 1

This is not currently supported. I considered it when first implementing this sequence task type. I thought it might be nice if by default an array inside an array would be interpreted as a ParallelTask type within a SequenceTask, so for example the following would run mypy and pylint in parallel then pytest after that:

test = [["mypy", "pylint"], "pytest"]

And of course you could also do:

test.parallel = ["mypy", "pylint", "pytest"]

However the problem is that I'm not sure what it should do with stdout. I imagine one wouldn't simply want both subprocesses to write to the same console at the same time! Maybe there could be a solution along the lines of capturing the output and feeding it out to the console one line at a time (maybe with a prefix linking it to the task that produced it, kind of like docker-compose does) but that's getting complicated to implement.

As I mention in #26, if the stdout of those tasks were configured to be captured anyway – such as for use in another task, or maybe to be piped to a file or discarded – then this problem goes away, and the tasks might as well be run in parallel. There's just the question left of how to handle a failure of one task in the set (whether to wait for the others).

I'd like to support parallel execution, but I'm really not sure how it should work. What do you think @MartinWallgren?

from poethepoet.

ThatXliner avatar ThatXliner commented on August 22, 2024 1

We could take some inspiration from https://github.com/open-cli-tools/concurrently#readme

from poethepoet.

luketych avatar luketych commented on August 22, 2024 1

+1 interest on implementing this

from poethepoet.

nat-n avatar nat-n commented on August 22, 2024

Also a potential if inelegant workaround might be to use a shell task with background jobs, like something along the lines of:

[tool.poe.tasks.test]
shell = """
poe mypy &
poe pylint &
poe pytest &
wait $(jobs -p)
"""

from poethepoet.

jnoortheen avatar jnoortheen commented on August 22, 2024

another way is to use the gnu-parallel command

parallel ::: "flake8" "mypy dirname"

@nat-n the intial implementation can be very simple.

  1. lets say there are three tasks passed, we give the tty to the first task only (that means no capturing), so user can see the progress from that task. once the first task is finished running, we print the output/error from the next task and so on.
  2. regarding error, we run all tasks even if we counter errors and return a failure code and mention what are all failed. (doing 1, we will be printing the errors already)

we can later add some config about how these are executed. It can be crossplatform alternative to parallel

from poethepoet.

ThatXliner avatar ThatXliner commented on August 22, 2024

we can later add some config about how these are executed. It can be crossplatform alternative to parallel

like a backend config on how to parallelize?

from poethepoet.

jnoortheen avatar jnoortheen commented on August 22, 2024

like a backend config on how to parallelize?

Yes some task or project level configs

from poethepoet.

nat-n avatar nat-n commented on August 22, 2024

Hi @jnoortheen, thanks for the idea.

I understand that you're proposing the following strategy which I'll call Strategy 1:

  1. let the first task in the list output directly to stdout until it completes
  2. for each subsequent task: buffer its stdout in memory (or a tempfile to avoid unbounded memory use) until it completes
  3. dump the buffered output of each completed task, once all previous tasks have been output

This is probably the best solution in terms of having a coherent output log at the end. Though it assumes that the tasks in the list are meaningfully ordered which doesn't seem necessary. Therefore it might sometimes make more sense to use the following Strategy 2 instead:

  1. treat all tasks in the list as having equal precedence and buffer their output until they complete
  2. whenever a task completes then dump its output to stdout (even if tasks specified earlier in the list are still running)

Both Strategy 1 and Strategy 2 would benefit from poe providing some extra output lines to clarify which output is from which task (unless running in quiet mode).

Strategy 3 would be like Strategy 2 except we capture and output each line of task output as it arrives (with some prefix indicating which task it came from)

And Strategy 4 would be to just let all tasks output directly to stdout on top of one another, which may sometimes be necessary to support

Are there any other strategies worth considering? Is is worthwhile also being able to direct outputs to separate filesystem locations? e.g. f"task_name_{subtask_number}.out"

I think it would be best if the user can configure the strategy for a specific parallel task independently for stdout and stderr, with Strategy 1 being the default for stdout and Strategy 3 or 4 being the default for stderr.

Maybe how to handle errors should also be configurable, with continuing other tasks but returning non-zero at the end as the default behaviour if one or more tasks fail. But also having the option to stop all tasks if one fails, or even to always continue and return zero, would also make sense.

I'm thinking this would require having a thread per running subtask, which is responsible for monitoring the subtask and handling its output.

To be clear I would not be keen on making gnu parallel (or any other binary less common than bash itself) a dependency of poethepoet, and implementing such an integration mechanism would probably be a bit complex to get right.

Any other ideas?

from poethepoet.

ThatXliner avatar ThatXliner commented on August 22, 2024

Seems good but why 3 or 4 as default for stderr?

On second thought, yeah: you want to see the errors quickly. I was thinking of those multi-line errors/warnings like those from pip… so maybe buffer the lines a bit until, say, 0.2 seconds has passed and no more new lines has been seen so far from process X?

from poethepoet.

nat-n avatar nat-n commented on August 22, 2024

I think this is an important feature, but it's currently not near the top of my list. If someone wants to submit a PoC for one or more of the strategies discussed above then that would help move it along :)

Strategy 1 using asyncio.subprocess as @jnoortheen suggests is probably a good place to start. I'm thinking this would be a new task type: parallel that is otherwise similar to the sequence task type.

from poethepoet.

luketych avatar luketych commented on August 22, 2024

@nat-n what is currently at the top of your list? Maybe some of us could help on those.

from poethepoet.

JCHacking avatar JCHacking commented on August 22, 2024

I think it would be easier to run it in threads, since the current code is not asynchronous (maybe for version 0.3 it could be rewritten asynchronously?).

from poethepoet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.