Is it possible to create a task that executes multiple tasks in parallel? <p dir="

We do something like this in bash: First run all processes tog

I've actually implemented that suggested solution <a href="https://github.com/xonsh/xo

We could take some inspiration from <a href="https://github.com/open-cli-tools/concurr

another way is to use the <a href="https://www.gnu.org/software/parallel/parallel_tuto

Is it possible to have a task list execute in parallel about poethepoet HOT 15 OPEN

nat-n commented on August 22, 2024 5

Is it possible to have a task list execute in parallel

from poethepoet.

Comments (15)

asfaltboy commented on August 22, 2024 2

We do something like this in bash:

First run all processes together, use a temp file to store each command's output
Then, we iterate over each command waiting for its completion and display that command's output.

The side effect of this simple method is that it seemingly "stalls" on the slowest command, returning when they all complete. This means that CMDS array should preferably be sorted fastest to slowest.

Click to expand!

for cmd in "${CMDS[@]}"; do
    stdout="$(mktemp)"
    timer="$(mktemp)"
    { { time $cmd >>"$stdout" 2>&1 ; } >>"$timer" 2>&1 ; } &
    pids+=($!)
    stdouts+=("$stdout")
    timers+=("$timer")
done

for i in ${!CMDS[*]}; do
    if wait "${pids[$i]}"; then
        codes+=(0)
    else
        codes+=(1)
    fi

    if [ "${codes[$i]}" -eq "0" ]; then
        echo -en "${C_GREEN}"
        echo -en "${CMDS[$i]}"
        echo -en "$C_RESET"
        echo -e " ($(cat "${timers[$i]}")s)"
    else
        echo -en "${C_RED}${C_UNDERLINE}"
        echo -en "${CMDS[$i]}"
        echo -e "$C_RESET"
        echo -e "$(cat "${stdouts[$i]}")"
    fi
    echo ""
done

from poethepoet.

jnoortheen commented on August 22, 2024 2

I've actually implemented that suggested solution here using asyncio.subprocess module. It just outputs stdout from commands to sys.stdout, stderr to stderr.

from poethepoet.

sewi-cpan commented on August 22, 2024 2

+1 for this request

from poethepoet.

nat-n commented on August 22, 2024 1

This is not currently supported. I considered it when first implementing this sequence task type. I thought it might be nice if by default an array inside an array would be interpreted as a ParallelTask type within a SequenceTask, so for example the following would run mypy and pylint in parallel then pytest after that:

test = [["mypy", "pylint"], "pytest"]

And of course you could also do:

test.parallel = ["mypy", "pylint", "pytest"]

However the problem is that I'm not sure what it should do with stdout. I imagine one wouldn't simply want both subprocesses to write to the same console at the same time! Maybe there could be a solution along the lines of capturing the output and feeding it out to the console one line at a time (maybe with a prefix linking it to the task that produced it, kind of like docker-compose does) but that's getting complicated to implement.

As I mention in #26, if the stdout of those tasks were configured to be captured anyway – such as for use in another task, or maybe to be piped to a file or discarded – then this problem goes away, and the tasks might as well be run in parallel. There's just the question left of how to handle a failure of one task in the set (whether to wait for the others).

I'd like to support parallel execution, but I'm really not sure how it should work. What do you think @MartinWallgren?

from poethepoet.

ThatXliner commented on August 22, 2024 1

We could take some inspiration from https://github.com/open-cli-tools/concurrently#readme

from poethepoet.

luketych commented on August 22, 2024 1

+1 interest on implementing this

from poethepoet.

nat-n commented on August 22, 2024

Also a potential if inelegant workaround might be to use a shell task with background jobs, like something along the lines of:

[tool.poe.tasks.test]
shell = """
poe mypy &
poe pylint &
poe pytest &
wait $(jobs -p)
"""

from poethepoet.

jnoortheen commented on August 22, 2024

another way is to use the gnu-parallel command

parallel ::: "flake8" "mypy dirname"

@nat-n the intial implementation can be very simple.

lets say there are three tasks passed, we give the tty to the first task only (that means no capturing), so user can see the progress from that task. once the first task is finished running, we print the output/error from the next task and so on.
regarding error, we run all tasks even if we counter errors and return a failure code and mention what are all failed. (doing 1, we will be printing the errors already)

we can later add some config about how these are executed. It can be crossplatform alternative to parallel

from poethepoet.

ThatXliner commented on August 22, 2024

we can later add some config about how these are executed. It can be crossplatform alternative to parallel

like a backend config on how to parallelize?

from poethepoet.

jnoortheen commented on August 22, 2024

like a backend config on how to parallelize?

Yes some task or project level configs

from poethepoet.

nat-n commented on August 22, 2024

Hi @jnoortheen, thanks for the idea.

I understand that you're proposing the following strategy which I'll call Strategy 1:

let the first task in the list output directly to stdout until it completes
for each subsequent task: buffer its stdout in memory (or a tempfile to avoid unbounded memory use) until it completes
dump the buffered output of each completed task, once all previous tasks have been output

This is probably the best solution in terms of having a coherent output log at the end. Though it assumes that the tasks in the list are meaningfully ordered which doesn't seem necessary. Therefore it might sometimes make more sense to use the following Strategy 2 instead:

treat all tasks in the list as having equal precedence and buffer their output until they complete
whenever a task completes then dump its output to stdout (even if tasks specified earlier in the list are still running)

Both Strategy 1 and Strategy 2 would benefit from poe providing some extra output lines to clarify which output is from which task (unless running in quiet mode).

Strategy 3 would be like Strategy 2 except we capture and output each line of task output as it arrives (with some prefix indicating which task it came from)

And Strategy 4 would be to just let all tasks output directly to stdout on top of one another, which may sometimes be necessary to support

Are there any other strategies worth considering? Is is worthwhile also being able to direct outputs to separate filesystem locations? e.g. f"task_name_{subtask_number}.out"

I think it would be best if the user can configure the strategy for a specific parallel task independently for stdout and stderr, with Strategy 1 being the default for stdout and Strategy 3 or 4 being the default for stderr.

Maybe how to handle errors should also be configurable, with continuing other tasks but returning non-zero at the end as the default behaviour if one or more tasks fail. But also having the option to stop all tasks if one fails, or even to always continue and return zero, would also make sense.

I'm thinking this would require having a thread per running subtask, which is responsible for monitoring the subtask and handling its output.

To be clear I would not be keen on making gnu parallel (or any other binary less common than bash itself) a dependency of poethepoet, and implementing such an integration mechanism would probably be a bit complex to get right.

Any other ideas?

from poethepoet.

ThatXliner commented on August 22, 2024

~~Seems good but why 3 or 4 as default for stderr?~~

On second thought, yeah: you want to see the errors quickly. I was thinking of those multi-line errors/warnings like those from pip… so maybe buffer the lines a bit until, say, 0.2 seconds has passed and no more new lines has been seen so far from process X?

from poethepoet.

nat-n commented on August 22, 2024

I think this is an important feature, but it's currently not near the top of my list. If someone wants to submit a PoC for one or more of the strategies discussed above then that would help move it along :)

Strategy 1 using asyncio.subprocess as @jnoortheen suggests is probably a good place to start. I'm thinking this would be a new task type: parallel that is otherwise similar to the sequence task type.

from poethepoet.

luketych commented on August 22, 2024

@nat-n what is currently at the top of your list? Maybe some of us could help on those.

from poethepoet.

JCHacking commented on August 22, 2024

I think it would be easier to run it in threads, since the current code is not asynchronous (maybe for version 0.3 it could be rewritten asynchronously?).

from poethepoet.

Is it possible to have a task list execute in parallel about poethepoet HOT 15 OPEN

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent