Comments (6)
Thanks @ArturT - sorry for the delay, this slipped my radar.
This all makes sense to me, thanks for detailing out everything
For our specific case:
- We're using CircleCI. I don't think there's an automated retry - I think we'd have to write some custom retry logic, like "Detect if firefox crashed based on the rspec exceptions and have a dependent job run".
- Lack of resources has been suggested by various stackoverflows when digging into the root of the crashes - we've tried tweaking the worker sizes, but it's still reproducible with the next size up. I'll take a look at switching to the latest firefox and geckodriver, that's a good call
👍 .
I'll send in that email and try to find a few examples of runs exhibiting the crashes, thanks!
from knapsack_pro-ruby.
We're not running into the original problem I posted about anymore (feel free to close this issue), but as an FYI - circleci has started experimental support for "rerun failed tests only".
We're trying this out and here's how we combine circleci failure retries with knapsack:
# Use circleci cli to find out if we need to run all tests or just failed tests
# We are telling circleci to split the tests across 1 node to get the full list of all tests for consideration. We leave the splitting to Knapsack Pro.
circleci tests glob "spec/**/*_spec.rb" | circleci tests run --index 0 --total 1 --command ">files.txt xargs echo" --verbose > files.txt
# replace all spaces with newlines in files.txt file
sed -i 's/ /\n/g' files.txt
# tell knapsack pro to run only tests from files.txt (and still use queueing magic)
if [[ -s "files.txt" ]]; then
export KNAPSACK_PRO_TEST_FILE_LIST_SOURCE_FILE=files.txt
bundle exec rake knapsack_pro:queue:rspec["${EXCLUDE_TAGS} ${FORMATTER_OPTIONS}"]
fi
from knapsack_pro-ruby.
is there a mechanism with knapsack to retry failed specs on another worker?
No.
Even if there was such a mechanism there might be edge cases. When could we consider that all parallel jobs completed work properly? Let's say you have a problematic CI job that has the firefox process failing and you can't run tests there. Let's say tests are put back in the Queue in knapsack_pro Queue Mode so that other parallel CI jobs can consume it.
But it's possible that all jobs processed the tests at a similar time as the problematic CI job failed and there will be no available parallel jobs ready to pick up the tests from the problematic CI job.
Most likely there are ways to solve this edge case and maybe force CI jobs to wait for some time till all test files are acknowledge by CI jobs so that we know all jobs executed tests.
There would be also edge case when tests are not acknowledge by CI jobs and we would have to handle that as well - maybe with some time out.
Probably there are more edge cases we need to consider as well. These are some on the top of my head.
Most likely there is no simple solution right a way to handle this. We would have to collect more feedback from other users if they would find it useful to auto-assign tests to other jobs when the CI job can't run tests and then try to find a simple solution as possible to avoid edge cases.
The simplest action for you to take for now could be:
- What CI provider do you use? Some of them like Buildkite allows to automatically retry failed jobs in a new isolated machine so this should restart the Firefox. Maybe there is a way to configure auto-retry of the parallel job on your CI server?
- You could try to add more resources CPU/RAM/disk to CI server to see it's more stable. Upgrade to the latest Firefox to ensure it's not some bug.
What is your organization ID or email? You can send it to [email protected] and I can review your account.
Do you use Queue Mode or Regular Mode?
from knapsack_pro-ruby.
I'm pasting here idea that might be useful for others looking at this issue:
You could collect test file paths of failing tests from all parallel nodes and generate file KNAPSACK_PRO_TEST_FILE_LIST_SOURCE_FILE=tmp/my_failing_specs.txt
You could extract that from junit XML report. https://knapsackpro.com/faq/question/how-to-use-junit-formatter#how-to-use-junit-formatter-with-knapsack_pro-queue-mode
Use this list of test files tmp/my_failing_specs.txt
to initialize a new CI build (or dependent CircleCI job):
https://knapsackpro.com/faq/question/how-to-run-a-specific-list-of-test-files-or-only-some-tests-from-test-file
(please ensure you updated knapsack_pro gem version first)
Please ensure tmp/my_failing_specs.txt
list of test files must be the same set of tests for all parallel nodes. Only one parallel CI node will initialize the Queue with a set of tests (the one that very first hits our API endpoint - we don't know which one it will be. That is why you must have the same set of test files in tmp/my_failing_specs.txt
on all parallel CI nodes.
You would have to collect the list of failed test files from all CI nodes and merge it into one file tmp/my_failing_specs.txt
before you run a new CI build (or a new job with retried failed tests).
from knapsack_pro-ruby.
story
The idea of running failed tests on another worker/CI node is part of the idea of improving Queue API.
Related internal ticket:
https://trello.com/c/KjXa29IJ
from knapsack_pro-ruby.
@MarkyMarkMcDonald Thanks for sharing the example.
from knapsack_pro-ruby.
Related Issues (20)
- Queue mode is running much slower than regular mode with Knapsack Pro HOT 23
- Ability to ignore some CI nodes HOT 11
- NameError: uninitialized constant RSpec::Core::Version HOT 10
- Feature requests(adding --seed & --bisect to rspec debug command) HOT 4
- Add configuration to write to /tmp directory instead of .knapsack_pro HOT 9
- Log output understanding question HOT 4
- Ci failing with missing file error HOT 4
- Ability to set an exit code for RSpec failure HOT 4
- Include error messages from dry run json report in stdout when knapsack_pro:rspec_test_example_detector fails HOT 8
- rspec_junit_formatter still producing invalid XML after applying suggestion from #40 HOT 3
- Support for Github Actions job summaries HOT 5
- Queue mode causes rspec_junit_formatter to produce invalid XML when a node runs a spec with all examples commented out. HOT 5
- Thread.join causes knapsack to hang on circleCI HOT 13
- Enabled KNAPSACK_PRO_RSPEC_SPLIT_BY_TEST_EXAMPLES and now get uninitialized constant at end of run HOT 10
- Regular mode (non-queue) rspec outputs all spec file names on startup HOT 4
- All examples skipped after exception in before: suite hook in queue mode HOT 2
- Memory leak HOT 2
- Lots of failures with rolify gem after 4.1 -> 5.0 bump HOT 12
- Docs need to specify ENV var to detect CI provider HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from knapsack_pro-ruby.