skroutz / rspecq Goto Github PK
View Code? Open in Web Editor NEWDistribute and run RSpec suites among parallel workers; for faster CI builds
Home Page: https://rubygems.org/gems/rspecq
License: MIT License
Distribute and run RSpec suites among parallel workers; for faster CI builds
Home Page: https://rubygems.org/gems/rspecq
License: MIT License
Hi all
I'm experiencing weird redis thrash that's causing the test suite to go very long, and smash redis;
Further information:
1030 tests in spec suite
redis 3.2 running in a docker container
ruby 2.7.4 running in a docker container
gems:
rspec 3.9.0
rspecq 0.7.1
redis 4.1.2
redis-rails 5.0.2
Interestingly the cmd/s is fine for 1 worker, but with 2 or more workers it jumps orders of magnitudes into the 10s or 100s of thousaands and saturates the redis instance, slowing everything down.
I am running the commands with the following:
RSPECQ_BUILD=<8 digit randomly generated string>
RSPECQ_REDIS_URL=redis://redis:6379/8
RSPECQ_MAX_REQUEUES=3
for i in $sequence; do
echo "Testing $i"
TEST_ENV_NUMBER=$i bundle exec rspecq --worker=$i "$FILES" > /tmp/rspecq_${RSPECQ_BUILD}_${i} 2>&1 &
pids[${i}]=$!
done
Hi! ๐ I ported our tests from parallel_test
to rspecq
to see if it might be a viable alternative for us. The architecture of a central test queue is something I've been looking for in ruby testing for some time, and I'm optimistic about the future for this gem. So this isn't a bug report, but a performance report that I thought you might find useful.
First, here's a little background info. Our unit tests run in our Kubernetes staging cluster, on 16 core AWS EC2 machines. In parallel_test
, it typically takes around 8 minutes and 30 seconds to complete, give or take 15 seconds.
To test, I wrote a runner script, rspecq_runner
, and a wrapper script, rspecq
, to spin up various threads.
rspecq_runner
#!/bin/bash
echo "Setting up database $TEST_ENV_NUMBER..."
bin/rails db:setup &> /dev/null
echo "Running tests $TEST_ENV_NUMBER..."
bundle exec rspecq --build=$TEST_ID --update-timings --worker=$TEST_ENV_NUMBER spec/
echo "Dropping database $TEST_ENV_NUMBER..."
bin/rails db:drop &> /dev/null
rspecq
#!/bin/bash
# By default, run one rspecq runner per CPU thread
CPU_COUNT=$(getconf _NPROCESSORS_ONLN)
TEST_ID=$RANDOM
# Uncomment to hard code how many rsqecq runners you want
# CPU_COUNT=14
echo "Starting test $RANDOM"
for i in $(seq $CPU_COUNT); do
# TEST_ENV_NUMBER is used to ensure each thread connects to its own Redis and DB instance
export TEST_ENV_NUMBER=$i
export TEST_ID
./bin/rspecq_runner &
done
bundle exec rspecq --build=$TEST_ID --report
echo "done"
First, I let the tests run a couple times on each of our test servers. This was done to ensure timings were stored in Redis. I then ran our test suite 8 times to see how long it took. Here's what I found:
Test run time |
---|
15:30 |
15:32 |
14:55 |
14:41 |
16:01 |
15:13 |
16:04 |
15:27 |
I then wanted to ensure I wasn't overloading the test machine, so I slowly incremented the number of runners:
CPU_COUNT |
runtime |
---|---|
2 | 27:07 |
4 | 19:26 |
6 | 17:18 |
8 | 16:25 |
10 | 16:15 |
12 | 15:47 |
14 | 15:15 |
I was hoping for a quick port and magically everything would be faster. Turns out, there are no silver bullets.
As you can see from the numbers, it looks like a simple port from parallel_test
to rspecq
would nearly double our CI time.
One option I do find interesting is spinning up runner pods that connect to a central Redis. This would allow us to distribute the CI load across our k8s cluster more evenly. I assume this would also scale better horizontally, and trade cluster CPU time for test speed without vertically scaling our EC2 instances.. However, this seems like quite a bit more effort, and I won't have time to pursue this short term.
Thanks again for working on this gem. As I said before, I'm still optimistic about the future of rspecq. It was very easy to get up and running. If the runtimes were comparable, I would advocate internally for an immediate switch.
There should be a way to have builds fail early if, for example, 50% of the executed examples are failures. This should be configurable and toggled on demand (e.g. in some cases a use might really want to see all failures).
Currently we requeue failures to guard against flaky tests. We also keep track of flaky tests and report them via Sentry (#21).
However, there is not much use of this feature until we also provide a way to reproduce these tests.
This is an umbrella task to track things we can do to make reproducing flaky tests in development mode possible.
We use a different bug tracking tool than sentry. Adding rspecq to the gem file and running some tests results in failures. I have traced this back to conflicts between the two bug tracking tools.
Since it doesn't make sense for us to include sentry in our dependency graph, does it make sense for sentry to be an optional dependency?
Since we can now pass the reproduction
flag which is runs rspecq in kind of a test mode, we can also instruct it to auto-generate a build and worker id.
This way at least the developers would not have to restart redis or always change the build id when testing locally.
Right now, we implicitly require administrators to have Redis configured with maxmemory
and maxmemory-policy
set to allkeys-lru
. While this is convenient for us, it's not very bullet-proof since we don't guarantee that the timings
key will not be evicted somehow. For example, the Redis instance might be reaching maxmemory
for other reasons and cause the timings
key to be evicted, if no builds have run for some time and some other app uses the same instance for some reason.
Instead of relying on this configuration, we could explicitly set TTLs to all the keys we use, except of those that should be persisted, i.e. timings
.
This probably obsoletes #5
Instead of integrating directly with rspecq (like we do with sentry) utilize an easy first choice like active support notifications to integrate indirectly, by publishing events at key points of the flow.
Right now it's set by a constant. Instead, it should be set via a CLI flag to the worker, e.g. --max-requeues
.
RSpecQ never expires keys from Redis. Instead it assumes the instance is configured to do so; see #55.
To protect users from this caveat, the reporter or the workers could detect the configuration (e.g. CONFIG GET) and warn the user if the instance is not configured to expire keys.
First, this gem is off to a great start! Thank you for putting your time into an open source library that helps the ruby community move forward!
I'm currently using parallel_tests for a test suite that takes a couple hours to run without parallelization. Two killer features it has are:
bin/rails parallel:setup
-n [PROCESSES] How many processes to use, default: available CPUs
If you're not already familiar with these, the main idea here is parallel_tests expects you to run multiple processes on the same machine. Each process will get an env var set, TEST_ENV_NUMBER
, which gives each process a distinct ID to use. This ID can be used in the Rails database.yml like database: test_<%= test_env_number %>
, which allows each process to use the same DB server, but unique databases during the test run.
Are there plans to for rspecq to do something similar?
I think the main benefit of handling this in rpsecq is it solves the problem once, rather than requiring all consumers to do similar setup work. Of course, that setup work isn't necessary if your runners are distributed. But this would give parallelt_test
users a convenient on ramp when switching to rspecq.
Currently we keep a single timings key (timings
). If that is somehow lost, all scheduling is then thrown out of the window. To avoid mishaps, we could also store the older timings keys (e.g. timings:<timestamp>
) and use them if the the latest one is somehow deleted.
Going a step further, we could also persist the key to disk and use it if no timings keys were found.
Hi all!
Solid library you guys have here, been using it for a CI test optimization that I've been working on and it's been great.
Lately I want to try to use the --file-split-threshold
feature to split one of my Ruby on Rails test files which is a bit slow (1 file has ~1000 examples, each takes around 0.5-1s to complete) so that it can be worked on by multiple workers, running the file as 1 job takes about ~400s to complete.
When I try to split the file using --file-split-threshold
, it gets split into ~1000 jobs and all of the workers don't even get past the 50-th job, turns out it's because the worker container ran out of memory (error code 137)
Here's a memory graph of said phenomenon (only 1 worker graph for this example)
Now I want to make sure if this is the problem on the tests I was running or a caveat with the library, so I pulled the rspecq repo and added this test
# my_spec.rb
RSpec.describe do
1000.times do
it do
expect(true).to be true
end
end
end
Dry run
redis-cli flushdb
bundle exec rspecq --build=0 --seed 1 --worker=1 \
--update-timings test/sample_suites/timings/spec/my_spec.rb
...then run it for file splitting
bundle exec rspecq --build=1 --seed 1 --worker=1 --update-timings --file-split-threshold 0 test/sample_suites/timings/spec/my_spec.rb
Turns out it's also hogging the memory on this dummy test as well, steadily increasing the memory usage until 3,7 GB.
*I modified the logging a bit just to see the executed examples better)
Now I'm pretty sure from the graph this indicates a memory leak (or bloat..?), and was wondering if there's something I'm missing before using the --file-split-threshold
option? Maybe a configuration that I have to specify on spec_helper.rb
or something like that. This is the spec_helper.rb
that I used on my test
RSpec.configure do |config|
# other config here, not really relevant
config.filter_run_when_matching :focus
end
Ruby version is 2.7.2
Can you help me look into this? I've been dabbling on this out of memory problem for a while ๐ข
Thanks ๐โโ๏ธ!
A short video that shows how to:
...could help with onboarding.
For test performance, we split our suite into acceptance
and unit
tests. We do this because acceptance
spec require static assets to be built, which takes a decent amount of time which, whereas unit
specs can be run instantly. We accomplish this with RSpec tags. For example, here is how we run our acceptance
tests using rspec-queue:
bin/_rspec-queue \
--namespace acceptance \
--tag capybara_feature \
--tag type:feature \
--tag type:request \
--tag js \
--tag webpack \
--format=doc \
--format=RspecJunitFormatter \
--out="tmp/test_results/rspec_acceptance/results-$CIRCLE_NODE_INDEX.xml" \
--requeue-tolerance=0.05 \
--max-requeues="$CIRCLE_NODE_TOTAL"
In order to use rspecq, we need to mimic this functionality. As of now, it does not seem like rspecq supports example filtering, is this right? If not, would it be possible to support this use case?
.rspec
file in project root (not important)Exceptions in rspecq are naturally visible and rare, since the whole build will fail. However there are events which may not be errors, but affect QoS:
We should emit those to Sentry.
If there are no spec files in a project (i.e. queue is empty), the build should be considered failure (i.e. by the reporter), as a safety net against unexpected scenarios.
It would be nice to also include instructions on how to reproduce (locally) the execution order that lead to the error.
We could perhaps submit the N (5-10) jobs that run prior to the flaky one, as a best-effort approach.
One thing we could also do, but this too is not so straightforward, is to emit the RSpec seed to Sentry. We could do these in next iterations.
Flaky tests, while not causing the build to fail, should still be fixed. Otherwise they can impact build times, since they can silently compound and cause many retries in each build.Consider a test suite that over time has 30 flaky tests. These could easily cause 90 additional example retries.
We should provide visibility into flaky tests (after the fact we determined they are flaky). Now that #16 is merged, we can report them to Sentry.
Currently, flaky tests are all emitted as a single event, with the same title "Flaky jobs detected". Thus, flaky job events from different CI builds all end up under a single Sentry event. For instance, this is the sole flaky job event as reported in one of our test suites:
The problem with this approach is that it's hard to answer questions such as:
Also, it's impossible to set alerts (e.g. using code owners) based on the file which flaky jobs occur in, or collaborate an specific issues to solve a particular flaky job (since can't resolve a specific flaky job).
We have to think of a better way to report flaky jobs, whether this involves changing the fingerprint of the events, submitting separate events per flaky job/file, changing the title of events, or a combination of these.
In addition to the command-line flags we support, every configuration setting should be also optionally set via an environment variable. This will easy integration with CI servers.
Currently, we default it to a very big number (999999) to effectively disable it (because no jobs take more than that). That doesn't make much sense, we should instead be able to set it to nil
to disable the splitting mechanism. The default should also be nil
.
Currently we use plain puts
inside Worker
and Reporter
to print various rspecq-level events like errors or warnings. However this mixes the output from that generated by RSpec and makes it hard to differentiate between the two, merely by glancing at the terminal. Ideally we should let RSpec print to stdout (the default) and we should use stderr for diagnostic messages originating from rspecq itself.
We should also use an actual logger and the appropriate levels for such cases, for a more detailed output. Ruby's Logger
from stdlib should be sufficient.
In general we want to know what each worker did after it is done, which files it run etc.
Tool should be indepondent. if it stops for any reason in any of those steps, it should be able to continue from the last step.
Post-install message from sentry-raven:
sentry-raven
is deprecated! Please migrate to sentry-ruby
See https://docs.sentry.io/platforms/ruby/migration for the migration guide.
I got this error RuntimeError: Queue not yet published after 30 seconds
with redis server listening to 127.0.0.1:6379
when running
bundle exec rspecq --build=123 --worker=foo1 spec/models/car_spec.rb
Can you help to fix this error?
In the event a worker dies (i.e. fails to emit a heartbeat in the specified timeframe) we should emit a warning to Sentry and also print a relevant warning to stdout.
I have not gotten rspecq to run successfully. When I execute bundle exec rspecq -b mybuild -w myworker
, I get the following error message and stack.
bundler: failed to load command: rspecq (/home/me/workspace/vendor/bundle/bin/rspecq)
ArgumentError: Formatter '#<RSpecQ::Formatters::FailureRecorder:0x00007f07da84beb0>' unknown - maybe you meant 'documentation' or 'progress'?.
/home/me/workspace/vendor/bundle/gems/rspec-core-3.6.0/lib/rspec/core/formatters.rb:178:in `find_formatter'
/home/me/workspace/vendor/bundle/gems/rspec-core-3.6.0/lib/rspec/core/formatters.rb:146:in `add'
/home/me/workspace/vendor/bundle/gems/rspec-core-3.6.0/lib/rspec/core/configuration.rb:876:in `add_formatter'
/home/me/workspace/vendor/bundle/gems/rspecq-0.7.2/lib/rspecq/worker.rb:122:in `block in work'
/home/me/workspace/vendor/bundle/gems/rspecq-0.7.2/lib/rspecq/worker.rb:94:in `loop'
/home/me/workspace/vendor/bundle/gems/rspecq-0.7.2/lib/rspecq/worker.rb:94:in `work'
/home/me/workspace/vendor/bundle/gems/rspecq-0.7.2/bin/rspecq:182:in `<top (required)>'
/home/me/workspace/vendor/bundle/bin/rspecq:23:in `load'
/home/me/workspace/vendor/bundle/bin/rspecq:23:in `<top (required)>'
I'm on rspecq 0.7.2 and rspec-core 3.4.4.
It looks like rspecq/worker.rb
is trying to pass formatter instances to RSpec.configuration.add_formatter
, but downstream from that function, it appears that RSpec::Core::Formatters::Loader#custom_formatter
expects either a string or a class. Am I reading this correctly?
In 42d20e7 we started redirecting stderr to stdout, so that we display a helpful message in case the split command fails.
However, this introduced a bug. If the dry-run command prints something to stderr but still succeeds (e.g. a deprecation warning coming from some gem or some application initializer), files_to_example_ids
fails because the output doesn't contain only JSON.
A failing test case that reproduces the issue can be found in branch gh34-testcase
.
As 42d20e7 suggested, we should grab both streams separately instead of redirecting stderr to stdout.
This depends on #1.
Right now we resort to shelling out and executing rspec
in another process:
Line 156 in 9f1e6fb
This is less than ideal since each project might have its own convention of calling into rspec (binstub, bundle exec
or others). We should instead do this programmatically like we already do with the other aspects of the worker. I suspect we can call straight into RSpec::Core::Runner
and pass the correct arguments (--dry-run
etc.)
Now the only redis option is a -r, --redis HOST
would be good to have URL option as well to be able to specify port and password
Right now the reporter waits 30" for the first worker to boot and publish the queue. Depending on the codebase this might be too much or too little. Therefore, we should make this configurable via the CLI.
Line 200 in 83bd0c0
Enabling StatsD reporting could be done via a CLI flag, --statsd
, that would accept a host/IP. Additionally we should fallback to the environment variable RSPECQ_STATSD
.
Metrics we could report (<ns>
stands for <namespace>
) grouped by type:
<ns>.builds.total
<ns>.builds.successful
<ns>.builds.successfulFlaky
<ns>.builds.failed
<ns>.builds.failed_fast
<ns>.builds.errored
<ns>.totalRuntime
<ns>.queueInitRuntime
<ns>.slowestJobs.<job>
<ns>.examples
<ns>.flakeyTests
<ns>.requeues
<ns>.failures
<ns>.errors
<ns>.workerFailures
<ns>.specFiles
<ns>.queueSize
<ns>.filesSplitted
<ns>.jobsFromSplit
<ns>.untimedJobs
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.