skroutz / rspecq Goto Github PK

View Code? Open in Web Editor NEW

159.0 7.0 24.0 146 KB

Distribute and run RSpec suites among parallel workers; for faster CI builds

Home Page: https://rubygems.org/gems/rspecq

License: MIT License

Ruby 100.00%

rspec ci test-runner rspec-testing ci-tools rspec-suite rspec-runner test-runners

rspecq's Issues

High redis commands/second on parallelization > 1

Hi all

I'm experiencing weird redis thrash that's causing the test suite to go very long, and smash redis;

1 worker:

2 workers:

8 workers:

Further information:
1030 tests in spec suite
redis 3.2 running in a docker container
ruby 2.7.4 running in a docker container

gems:
rspec 3.9.0
rspecq 0.7.1
redis 4.1.2
redis-rails 5.0.2

Interestingly the cmd/s is fine for 1 worker, but with 2 or more workers it jumps orders of magnitudes into the 10s or 100s of thousaands and saturates the redis instance, slowing everything down.

I am running the commands with the following:

RSPECQ_BUILD=<8 digit randomly generated string>
RSPECQ_REDIS_URL=redis://redis:6379/8
RSPECQ_MAX_REQUEUES=3

for i in $sequence; do
    echo "Testing $i"
    TEST_ENV_NUMBER=$i bundle exec rspecq --worker=$i "$FILES" > /tmp/rspecq_${RSPECQ_BUILD}_${i} 2>&1 &
    pids[${i}]=$!
done

Performance report

Hi! 👋 I ported our tests from parallel_test to rspecq to see if it might be a viable alternative for us. The architecture of a central test queue is something I've been looking for in ruby testing for some time, and I'm optimistic about the future for this gem. So this isn't a bug report, but a performance report that I thought you might find useful.

First, here's a little background info. Our unit tests run in our Kubernetes staging cluster, on 16 core AWS EC2 machines. In parallel_test, it typically takes around 8 minutes and 30 seconds to complete, give or take 15 seconds.

To test, I wrote a runner script, rspecq_runner, and a wrapper script, rspecq, to spin up various threads.

`rspecq_runner`

#!/bin/bash

echo "Setting up database $TEST_ENV_NUMBER..."
bin/rails db:setup &> /dev/null

echo "Running tests $TEST_ENV_NUMBER..."
bundle exec rspecq --build=$TEST_ID --update-timings --worker=$TEST_ENV_NUMBER spec/

echo "Dropping database $TEST_ENV_NUMBER..."
bin/rails db:drop &> /dev/null

`rspecq`

 #!/bin/bash

# By default, run one rspecq runner per CPU thread
CPU_COUNT=$(getconf _NPROCESSORS_ONLN)
TEST_ID=$RANDOM

# Uncomment to hard code how many rsqecq runners you want
# CPU_COUNT=14

echo "Starting test $RANDOM"

for i in $(seq $CPU_COUNT); do
   # TEST_ENV_NUMBER is used to ensure each thread connects to its own Redis and DB instance 
   export TEST_ENV_NUMBER=$i
   export TEST_ID
   ./bin/rspecq_runner &
done

bundle exec rspecq --build=$TEST_ID --report

echo "done"

Results

First, I let the tests run a couple times on each of our test servers. This was done to ensure timings were stored in Redis. I then ran our test suite 8 times to see how long it took. Here's what I found:

Test run time
15:30
15:32
14:55
14:41
16:01
15:13
16:04
15:27

I then wanted to ensure I wasn't overloading the test machine, so I slowly incremented the number of runners:

`CPU_COUNT`	runtime
2	27:07
4	19:26
6	17:18
8	16:25
10	16:15
12	15:47
14	15:15

Final thoughts

I was hoping for a quick port and magically everything would be faster. Turns out, there are no silver bullets.

As you can see from the numbers, it looks like a simple port from parallel_test to rspecq would nearly double our CI time.

One option I do find interesting is spinning up runner pods that connect to a central Redis. This would allow us to distribute the CI load across our k8s cluster more evenly. I assume this would also scale better horizontally, and trade cluster CPU time for test speed without vertically scaling our EC2 instances.. However, this seems like quite a bit more effort, and I won't have time to pursue this short term.

Thanks again for working on this gem. As I said before, I'm still optimistic about the future of rspecq. It was very easy to get up and running. If the runtimes were comparable, I would advocate internally for an immediate switch.

Fail-fast builds upon too many failures

There should be a way to have builds fail early if, for example, 50% of the executed examples are failures. This should be configurable and toggled on demand (e.g. in some cases a use might really want to see all failures).

Reproducing flaky tests

Currently we requeue failures to guard against flaky tests. We also keep track of flaky tests and report them via Sentry (#21).

However, there is not much use of this feature until we also provide a way to reproduce these tests.

This is an umbrella task to track things we can do to make reproducing flaky tests in development mode possible.

Remove sentry as a required dependency

We use a different bug tracking tool than sentry. Adding rspecq to the gem file and running some tests results in failures. I have traced this back to conflicts between the two bug tracking tools.

Since it doesn't make sense for us to include sentry in our dependency graph, does it make sense for sentry to be an optional dependency?

When reproduction flag is passed build/worker ids are not required

Since we can now pass the reproduction flag which is runs rspecq in kind of a test mode, we can also instruct it to auto-generate a build and worker id.

This way at least the developers would not have to restart redis or always change the build id when testing locally.

Set TTL to expire keys instead of relying to Redis eviction policy

Right now, we implicitly require administrators to have Redis configured with maxmemory and maxmemory-policy set to allkeys-lru. While this is convenient for us, it's not very bullet-proof since we don't guarantee that the timings key will not be evicted somehow. For example, the Redis instance might be reaching maxmemory for other reasons and cause the timings key to be evicted, if no builds have run for some time and some other app uses the same instance for some reason.

Instead of relying on this configuration, we could explicitly set TTLs to all the keys we use, except of those that should be persisted, i.e. timings.

This probably obsoletes #5

Integrate indirectly using a pub/sub mechanism

Instead of integrating directly with rspecq (like we do with sentry) utilize an easy first choice like active support notifications to integrate indirectly, by publishing events at key points of the flow.

Make max requeues number configurable

Right now it's set by a constant. Instead, it should be set via a CLI flag to the worker, e.g. --max-requeues.

Warn when the Redis instance is not configured to expire keys

RSpecQ never expires keys from Redis. Instead it assumes the instance is configured to do so; see #55.

To protect users from this caveat, the reporter or the workers could detect the configuration (e.g. CONFIG GET) and warn the user if the instance is not configured to expire keys.

Automatically handle parallel execution

First, this gem is off to a great start! Thank you for putting your time into an open source library that helps the ruby community move forward!

I'm currently using parallel_tests for a test suite that takes a couple hours to run without parallelization. Two killer features it has are:

setup script: bin/rails parallel:setup
flag: -n [PROCESSES] How many processes to use, default: available CPUs

If you're not already familiar with these, the main idea here is parallel_tests expects you to run multiple processes on the same machine. Each process will get an env var set, TEST_ENV_NUMBER, which gives each process a distinct ID to use. This ID can be used in the Rails database.yml like database: test_<%= test_env_number %>, which allows each process to use the same DB server, but unique databases during the test run.

Are there plans to for rspecq to do something similar?

I think the main benefit of handling this in rpsecq is it solves the problem once, rather than requiring all consumers to do similar setup work. Of course, that setup work isn't necessary if your runners are distributed. But this would give parallelt_test users a convenient on ramp when switching to rspecq.

Preserve older timings keys

Currently we keep a single timings key (timings). If that is somehow lost, all scheduling is then thrown out of the window. To avoid mishaps, we could also store the older timings keys (e.g. timings:<timestamp>) and use them if the the latest one is somehow deleted.

Going a step further, we could also persist the key to disk and use it if no timings keys were found.

Using file split to run individual examples as jobs using lots of memory?

Hi all!

Solid library you guys have here, been using it for a CI test optimization that I've been working on and it's been great.
Lately I want to try to use the --file-split-threshold feature to split one of my Ruby on Rails test files which is a bit slow (1 file has ~1000 examples, each takes around 0.5-1s to complete) so that it can be worked on by multiple workers, running the file as 1 job takes about ~400s to complete.

When I try to split the file using --file-split-threshold, it gets split into ~1000 jobs and all of the workers don't even get past the 50-th job, turns out it's because the worker container ran out of memory (error code 137)

Here's a memory graph of said phenomenon (only 1 worker graph for this example)

Now I want to make sure if this is the problem on the tests I was running or a caveat with the library, so I pulled the rspecq repo and added this test

# my_spec.rb
RSpec.describe do
  1000.times do
    it do
      expect(true).to be true
    end
  end
end

Dry run

redis-cli flushdb
bundle exec rspecq --build=0 --seed 1 --worker=1 \
  --update-timings test/sample_suites/timings/spec/my_spec.rb

...then run it for file splitting

bundle exec rspecq --build=1 --seed 1 --worker=1 --update-timings --file-split-threshold 0 test/sample_suites/timings/spec/my_spec.rb

Turns out it's also hogging the memory on this dummy test as well, steadily increasing the memory usage until 3,7 GB.

*I modified the logging a bit just to see the executed examples better)

Now I'm pretty sure from the graph this indicates a memory leak (or bloat..?), and was wondering if there's something I'm missing before using the --file-split-threshold option? Maybe a configuration that I have to specify on spec_helper.rb or something like that. This is the spec_helper.rb that I used on my test

RSpec.configure do |config|
  # other config here, not really relevant

  config.filter_run_when_matching :focus
end

Ruby version is 2.7.2

Can you help me look into this? I've been dabbling on this out of memory problem for a while 😢

Thanks 🙇‍♂️!

Create an introductory short video

A short video that shows how to:

run a few workers in parallel
run the reporter to view the build progress

...could help with onboarding.

Support native RSpec example filtering

For test performance, we split our suite into acceptance and unit tests. We do this because acceptance spec require static assets to be built, which takes a decent amount of time which, whereas unit specs can be run instantly. We accomplish this with RSpec tags. For example, here is how we run our acceptance tests using rspec-queue:

            bin/_rspec-queue \
              --namespace acceptance \
              --tag capybara_feature \
              --tag type:feature \
              --tag type:request \
              --tag js \
              --tag webpack \
              --format=doc \
              --format=RspecJunitFormatter \
              --out="tmp/test_results/rspec_acceptance/results-$CIRCLE_NODE_INDEX.xml" \
              --requeue-tolerance=0.05 \
              --max-requeues="$CIRCLE_NODE_TOTAL"

In order to use rspecq, we need to mimic this functionality. As of now, it does not seem like rspecq supports example filtering, is this right? If not, would it be possible to support this use case?

Add tests

Sentry integration

Exceptions in rspecq are naturally visible and rare, since the whole build will fail. However there are events which may not be errors, but affect QoS:

requeued lost job (i.e. worker went faulty); degrades performance
no timings found (i.e. jobs will be scheduled randomly); performance killer
error while trying to split slow spec files; performance killer

We should emit those to Sentry.

Fail build if published queue is empty

If there are no spec files in a project (i.e. queue is empty), the build should be considered failure (i.e. by the reporter), as a safety net against unexpected scenarios.

Integrate rubocop in CI

Provide info on how to reproduce flaky spec

It would be nice to also include instructions on how to reproduce (locally) the execution order that lead to the error.

@agis wrote on #31

We could perhaps submit the N (5-10) jobs that run prior to the flaky one, as a best-effort approach.

One thing we could also do, but this too is not so straightforward, is to emit the RSpec seed to Sentry. We could do these in next iterations.

Provide visibility to flaky tests

Flaky tests, while not causing the build to fail, should still be fixed. Otherwise they can impact build times, since they can silently compound and cause many retries in each build.Consider a test suite that over time has 30 flaky tests. These could easily cause 90 additional example retries.

We should provide visibility into flaky tests (after the fact we determined they are flaky). Now that #16 is merged, we can report them to Sentry.

print to stdout of reporter
submit to Sentry

cli: Document environment variables

Improve flaky test reporting in Sentry

Currently, flaky tests are all emitted as a single event, with the same title "Flaky jobs detected". Thus, flaky job events from different CI builds all end up under a single Sentry event. For instance, this is the sole flaky job event as reported in one of our test suites:

The problem with this approach is that it's hard to answer questions such as:

when was this particular flaky test introduced?
which file has the most flaky test?
which are the files currently that contain flaky tests?

Also, it's impossible to set alerts (e.g. using code owners) based on the file which flaky jobs occur in, or collaborate an specific issues to solve a particular flaky job (since can't resolve a specific flaky job).

We have to think of a better way to report flaky jobs, whether this involves changing the fingerprint of the events, submitting separate events per flaky job/file, changing the title of events, or a combination of these.

Read configuration from environment variables

In addition to the command-line flags we support, every configuration setting should be also optionally set via an environment variable. This will easy integration with CI servers.

Automatically set file split threshold

Default Worker#file_split_threshold to nil

Currently, we default it to a very big number (999999) to effectively disable it (because no jobs take more than that). That doesn't make much sense, we should instead be able to set it to nil to disable the splitting mechanism. The default should also be nil.

Use a proper logger for rspecq-level messages

Currently we use plain puts inside Worker and Reporter to print various rspecq-level events like errors or warnings. However this mixes the output from that generated by RSpec and makes it hard to differentiate between the two, merely by glancing at the terminal. Ideally we should let RSpec print to stdout (the default) and we should use stderr for diagnostic messages originating from rspecq itself.

We should also use an actual logger and the appropriate levels for such cases, for a more detailed output. Ruby's Logger from stdlib should be sufficient.

Find a way to get individual worker output

In general we want to know what each worker did after it is done, which files it run etc.

A tool for new releases

run tests (one last time)
rubocop
update version (requires some kind of input)
update changelog (and allow for review before moving forward)
create and publish gem
create and push git tag

Tool should be indepondent. if it stops for any reason in any of those steps, it should be able to continue from the last step.

sentry-raven is deprecated

Post-install message from sentry-raven:
sentry-raven is deprecated! Please migrate to sentry-ruby

See https://docs.sentry.io/platforms/ruby/migration for the migration guide.

RuntimeError: Queue not yet published after 30 seconds

I got this error RuntimeError: Queue not yet published after 30 seconds with redis server listening to 127.0.0.1:6379
when running

bundle exec rspecq --build=123 --worker=foo1 spec/models/car_spec.rb

Can you help to fix this error?

Report dead workers to Sentry

In the event a worker dies (i.e. fails to emit a heartbeat in the specified timeframe) we should emit a warning to Sentry and also print a relevant warning to stdout.

"Formatter ... unknown" error on startup

I have not gotten rspecq to run successfully. When I execute bundle exec rspecq -b mybuild -w myworker, I get the following error message and stack.

bundler: failed to load command: rspecq (/home/me/workspace/vendor/bundle/bin/rspecq)
ArgumentError: Formatter '#<RSpecQ::Formatters::FailureRecorder:0x00007f07da84beb0>' unknown - maybe you meant 'documentation' or 'progress'?.
  /home/me/workspace/vendor/bundle/gems/rspec-core-3.6.0/lib/rspec/core/formatters.rb:178:in `find_formatter'
  /home/me/workspace/vendor/bundle/gems/rspec-core-3.6.0/lib/rspec/core/formatters.rb:146:in `add'
  /home/me/workspace/vendor/bundle/gems/rspec-core-3.6.0/lib/rspec/core/configuration.rb:876:in `add_formatter'
  /home/me/workspace/vendor/bundle/gems/rspecq-0.7.2/lib/rspecq/worker.rb:122:in `block in work'
  /home/me/workspace/vendor/bundle/gems/rspecq-0.7.2/lib/rspecq/worker.rb:94:in `loop'
  /home/me/workspace/vendor/bundle/gems/rspecq-0.7.2/lib/rspecq/worker.rb:94:in `work'
  /home/me/workspace/vendor/bundle/gems/rspecq-0.7.2/bin/rspecq:182:in `<top (required)>'
  /home/me/workspace/vendor/bundle/bin/rspecq:23:in `load'
  /home/me/workspace/vendor/bundle/bin/rspecq:23:in `<top (required)>'

I'm on rspecq 0.7.2 and rspec-core 3.4.4.

It looks like rspecq/worker.rb is trying to pass formatter instances to RSpec.configuration.add_formatter, but downstream from that function, it appears that RSpec::Core::Formatters::Loader#custom_formatter expects either a string or a class. Am I reading this correctly?

worker: files_to_example_ids fails if something is printed to stderr

In 42d20e7 we started redirecting stderr to stdout, so that we display a helpful message in case the split command fails.

However, this introduced a bug. If the dry-run command prints something to stderr but still succeeds (e.g. a deprecation warning coming from some gem or some application initializer), files_to_example_ids fails because the output doesn't contain only JSON.

A failing test case that reproduces the issue can be found in branch gh34-testcase.

As 42d20e7 suggested, we should grab both streams separately instead of redirecting stderr to stdout.

CI builds with different rspec-core versions

This depends on #1.

Split spec files into examples programmatically

Right now we resort to shelling out and executing rspec in another process:

rspecq/lib/rspecq/worker.rb

Line 156 in 9f1e6fb

cmd = "DISABLE_SPRING=1 bin/rspec --dry-run --format json #{files.join(' ')}"

This is less than ideal since each project might have its own convention of calling into rspec (binstub, bundle exec or others). We should instead do this programmatically like we already do with the other aspects of the worker. I suspect we can call straight into RSpec::Core::Runner and pass the correct arguments (--dry-run etc.)

Allow to connect to Redis via url

Now the only redis option is a -r, --redis HOST
would be good to have URL option as well to be able to specify port and password

Make queue publish wait timeout configurable

Right now the reporter waits 30" for the first worker to boot and publish the queue. Depending on the codebase this might be too much or too little. Therefore, we should make this configurable via the CLI.

rspecq/lib/rspecq/queue.rb

Line 200 in 83bd0c0

def wait_until_published(timeout=30)

Report number of pending examples in build summary

StatsD integration

API

Enabling StatsD reporting could be done via a CLI flag, --statsd, that would accept a host/IP. Additionally we should fallback to the environment variable RSPECQ_STATSD.

Metrics

Metrics we could report (<ns> stands for <namespace>) grouped by type:

Counters

number of successful builds <ns>.builds.total
number of successful builds <ns>.builds.successful
number of successful but flaky builds <ns>.builds.successfulFlaky
number of failed builds <ns>.builds.failed
number of failed-fast builds <ns>.builds.failed_fast
number of builds with a non-example error <ns>.builds.errored

Timers

[reporter] build total run time <ns>.totalRuntime
[worker] queue initialization run time <ns>.queueInitRuntime
[reporter] run times of slowest jobs (top 10) <ns>.slowestJobs.<job>

Gauges

[reporter] number of examples executed <ns>.examples
[queue] number of flaky examples <ns>.flakeyTests
[reporter] number of requeues <ns>.requeues
[reporter] number of example failures <ns>.failures
[reporter] number of non-example errors (e.g. syntax errors) <ns>.errors
[worker?] number of worker failures <ns>.workerFailures
[worker] total number of spec files <ns>.specFiles
[worker] total queue size (aka. number of jobs) <ns>.queueSize
[worker] number of spec files splitted <ns>.filesSplitted
[worker] number of jobs generated from the splitted files <ns>.jobsFromSplit
[worker] new (untimed) job received <ns>.untimedJobs

skroutz / rspecq Goto Github PK

rspecq's Issues

rspecq_runner

rspecq