We use rspec-retry to deal with

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Is it possible to retry failing specs on another worker? about knapsack_pro-ruby HOT 6 OPEN

knapsackpro commented on May 25, 2024

Is it possible to retry failing specs on another worker?

from knapsack_pro-ruby.

Comments (6)

MarkyMarkMcDonald commented on May 25, 2024 2

Thanks @ArturT - sorry for the delay, this slipped my radar.

This all makes sense to me, thanks for detailing out everything 👍 .

For our specific case:

We're using CircleCI. I don't think there's an automated retry - I think we'd have to write some custom retry logic, like "Detect if firefox crashed based on the rspec exceptions and have a dependent job run".
Lack of resources has been suggested by various stackoverflows when digging into the root of the crashes - we've tried tweaking the worker sizes, but it's still reproducible with the next size up. I'll take a look at switching to the latest firefox and geckodriver, that's a good call 👍 .

I'll send in that email and try to find a few examples of runs exhibiting the crashes, thanks!

from knapsack_pro-ruby.

MarkyMarkMcDonald commented on May 25, 2024 1

We're not running into the original problem I posted about anymore (feel free to close this issue), but as an FYI - circleci has started experimental support for "rerun failed tests only".

We're trying this out and here's how we combine circleci failure retries with knapsack:

    # Use circleci cli to find out if we need to run all tests or just failed tests
    # We are telling circleci to split the tests across 1 node to get the full list of all tests for consideration. We leave the splitting to Knapsack Pro.
    circleci tests glob "spec/**/*_spec.rb" | circleci tests run --index 0 --total 1 --command ">files.txt xargs echo" --verbose > files.txt

    # replace all spaces with newlines in files.txt file
    sed -i 's/ /\n/g' files.txt

    # tell knapsack pro to run only tests from files.txt (and still use queueing magic)
    if [[ -s "files.txt" ]]; then
      export KNAPSACK_PRO_TEST_FILE_LIST_SOURCE_FILE=files.txt
      bundle exec rake knapsack_pro:queue:rspec["${EXCLUDE_TAGS} ${FORMATTER_OPTIONS}"]
    fi

from knapsack_pro-ruby.

ArturT commented on May 25, 2024

Hi @MarkyMarkMcDonald

is there a mechanism with knapsack to retry failed specs on another worker?

No.

Even if there was such a mechanism there might be edge cases. When could we consider that all parallel jobs completed work properly? Let's say you have a problematic CI job that has the firefox process failing and you can't run tests there. Let's say tests are put back in the Queue in knapsack_pro Queue Mode so that other parallel CI jobs can consume it.
But it's possible that all jobs processed the tests at a similar time as the problematic CI job failed and there will be no available parallel jobs ready to pick up the tests from the problematic CI job.

Most likely there are ways to solve this edge case and maybe force CI jobs to wait for some time till all test files are acknowledge by CI jobs so that we know all jobs executed tests.

There would be also edge case when tests are not acknowledge by CI jobs and we would have to handle that as well - maybe with some time out.

Probably there are more edge cases we need to consider as well. These are some on the top of my head.

Most likely there is no simple solution right a way to handle this. We would have to collect more feedback from other users if they would find it useful to auto-assign tests to other jobs when the CI job can't run tests and then try to find a simple solution as possible to avoid edge cases.

The simplest action for you to take for now could be:

What CI provider do you use? Some of them like Buildkite allows to automatically retry failed jobs in a new isolated machine so this should restart the Firefox. Maybe there is a way to configure auto-retry of the parallel job on your CI server?
You could try to add more resources CPU/RAM/disk to CI server to see it's more stable. Upgrade to the latest Firefox to ensure it's not some bug.

What is your organization ID or email? You can send it to [email protected] and I can review your account.
Do you use Queue Mode or Regular Mode?

from knapsack_pro-ruby.

ArturT commented on May 25, 2024

I'm pasting here idea that might be useful for others looking at this issue:

You could collect test file paths of failing tests from all parallel nodes and generate file KNAPSACK_PRO_TEST_FILE_LIST_SOURCE_FILE=tmp/my_failing_specs.txt
You could extract that from junit XML report. https://knapsackpro.com/faq/question/how-to-use-junit-formatter#how-to-use-junit-formatter-with-knapsack_pro-queue-mode

Use this list of test files tmp/my_failing_specs.txt to initialize a new CI build (or dependent CircleCI job):
https://knapsackpro.com/faq/question/how-to-run-a-specific-list-of-test-files-or-only-some-tests-from-test-file
(please ensure you updated knapsack_pro gem version first)
Please ensure tmp/my_failing_specs.txt list of test files must be the same set of tests for all parallel nodes. Only one parallel CI node will initialize the Queue with a set of tests (the one that very first hits our API endpoint - we don't know which one it will be. That is why you must have the same set of test files in tmp/my_failing_specs.txt on all parallel CI nodes.
You would have to collect the list of failed test files from all CI nodes and merge it into one file tmp/my_failing_specs.txt before you run a new CI build (or a new job with retried failed tests).

from knapsack_pro-ruby.

ArturT commented on May 25, 2024

story

The idea of running failed tests on another worker/CI node is part of the idea of improving Queue API.
Related internal ticket:
https://trello.com/c/KjXa29IJ

from knapsack_pro-ruby.

ArturT commented on May 25, 2024

@MarkyMarkMcDonald Thanks for sharing the example.

from knapsack_pro-ruby.

Is it possible to retry failing specs on another worker? about knapsack_pro-ruby HOT 6 OPEN

Comments (6)

story

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent