Giter VIP home page Giter VIP logo

feditest's Introduction

Feditest: test federated protocols such as those in the Fediverse

This repo contains:

  • the FediTest test framework

which allows you to define and run test plans that involve constellations of servers (like Fediverse instances) whose communication you want to test.

The actual tests for the Fediverse are in their own repository.

For more details, check out feditest.org and find us on Matrix in #fediverse-testing:matrix.org.

Found a bug? You must be kidding; like in all of Arch Linux, there are no bugs in this software. But if we happen to be wrong, submit a bug report with as much detail as possible right here for this project on Github.

feditest's People

Contributors

steve-bate avatar jernst avatar mexon avatar

Stargazers

taichan avatar Sho Sakuma avatar Tatsuto YAMAMOTO avatar Charles Wiltgen avatar Madeline avatar Max avatar Tom Casavant avatar Keyth M Citizen  avatar Dario Vladović avatar Tom Brown avatar David Dadon avatar Daniel Hall avatar TAKAHASHI Shuuji avatar Jussi Kuosa avatar Nikolaus Schlemm avatar Ricardo avatar Tim Chambers avatar Barry Frost avatar AJ Jordan avatar Ryan Barrett avatar Acid Chicken avatar 즈눅 avatar Hong Minhee (洪 民憙) avatar Kagami Sascha Rosylight avatar  avatar Andy Piper avatar Rafael Carício avatar Nicolas Constant avatar Erlend Sogge Heggen avatar Mayel de Borniol avatar

Watchers

 avatar

feditest's Issues

Test that applications return descriptive error messages

(from this discussion: https://mastodon.social/@benpate/111161976484128397 )

Just this: there are tons of ways that an API call can fail, but most Fediverse software just returns something unhelpful, like 500 Internal Server Error.

For production software, that’s probably reasonable, but a well-written test suite should help people debug, not just tell them about the bug.

So if I have a voice, a test suite should really return errors like: “this message was not accepted because it is missing a ‘name’ field.”

Not sure what exactly we can do about this here, but let's record it anyway.

Also the follow-up in the thread:

Nomad returns delivery reports indicating the disposition of a message once it hits the server and let's normal people examine the reports for any message they send. We needed this years ago because Diaspora silently dropped 1/3 of all communications and we needed to be able to prove the issue wasn't our software's fault. The assumption was that site operators of that period were often high school kids that couldn't read or interpret log files, so this let us find federation issues without involving them.

It would be no small feat getting this kind of thing standardised and adopted in ActivityPub, but we do provide feedback of the http error code for all sites -- and if something is stuck in the queue, one can see the results of the individual delivery attempts. Nomad sites provide much more detailed answers of what happened - lack of permission, filter/blocking rule, duplicate, recipient not found, delivered, etc...

Create a full test run transcript in JSON

Right now, we have to re-run a TestRun to format a report differently. That's not a good idea given how expensive TestRuns are. Also, reports generally don't show all information that was collected during a TestRun, and if a user decides they needed more detail than a particular report shows, they have to re-run the TestPlan.

Instead:

  1. During a TestRun, silently log everything.
  2. Create an optional output format that dumps all that was recorded into a JSON file.
  3. Create a sub-command that can parse such a JSON log and generate other forms of report from it, without re-running the tests.

There's also the advantage of being able to run several different reports from the same TestRun, like a web page and a summary.

Create an implementation of FediverseNode that delegates to the Mastodon client API

Let's call this FediverseNode subclass MastodonClientApiNode.

The first scenario would be something like this:

  • Either modify MastodonUbosNodeDriver or create a version of SaasFediverseNodeDriver whose _provision_node instantiates MastodonClientApiNode.
  • In MastodonClientApiNode, override make_create_note and use the Mastodon client API to actually make a create activity with a Note when invoked and return the URL of the created activity.
  • Run a simplified version of DeliverToInboxTest to invoke create_note from a test.

If we can do this, we know we can do the rest of what this MastodonClientApiNode will have to do, too.

Some notes:

  • NodeDriver._provision_node is being handed the hostname in its parameters if specified in the TestPlan file.
  • How to get an oauth token is tbd. Is there a way to do this that:
    1. can be performed from behind the firewall, against a public Mastodon instance, without another cooperating public website? At all?
    2. ... and also can be scripted requiring no human intervention?
    3. If not, can it be automated against a locally running instance? (e.g. by invoking a script that runs a query against the Mastodon database; assume feditest has access to all relevant installation and db data including credentials)

Tap report should only show tap, not the exceptions

E.g. currently:

% cd feditest-tests-sandbox jernst$ ../feditest/venv/bin/feditest run --tap
2024-05-01T17:08:52Z [ERROR] feditest: FAILED test assertion: sandbox.example_test_with_functions::example_test2: 
Expected: <-56>
     but: was <0>
 Traceback (most recent call last):

  File "/Users/jernst/git/github.com/fediverse-devnet/feditest/venv/lib/python3.12/site-packages/feditest/__init__.py", line 91, in run
    self.test_function(**args)

  File "/Users/jernst/git/github.com/fediverse-devnet/feditest-tests-sandbox/tests/sandbox/example_test_with_functions.py", line 51, in example_test2
    assert_that(c, equal_to(-56))

  File "/Users/jernst/git/github.com/fediverse-devnet/feditest/venv/lib/python3.12/site-packages/hamcrest/core/assert_that.py", line 58, in assert_that
    _assert_match(actual=actual_or_assertion, matcher=matcher, reason=reason)

  File "/Users/jernst/git/github.com/fediverse-devnet/feditest/venv/lib/python3.12/site-packages/hamcrest/core/assert_that.py", line 73, in _assert_match
    raise AssertionError(description)

AssertionError: 
Expected: <-56>
     but: was <0>


TAP version 14
# test plan: None
# session: session_0
# constellation: A_vs_1
#   name: A_vs_1
#   roles:
#     - name: client
#       driver: sandbox.SandboxMultClientDriver_ImplementationA
#     - name: server
#       driver: sandbox.SandboxMultServerDriver_Implementation1
ok 1 - sandbox.example_test_with_functions::example_test1
ok 2 - sandbox.example_test_with_functions::example_test2
ok 3 - sandbox.example_test_with_classes::ExampleTest1
# session: session_1
# constellation: A_vs_2
#   name: A_vs_2
#   roles:
#     - name: client
#       driver: sandbox.SandboxMultClientDriver_ImplementationA
#     - name: server
#       driver: sandbox.SandboxMultServerDriver_Implementation2Faulty
ok 4 - sandbox.example_test_with_functions::example_test1
not ok 5 - sandbox.example_test_with_functions::example_test2
  ---
  problem: |
    Expected: <-56>
         but: was <0>
  ...
ok 6 - sandbox.example_test_with_classes::ExampleTest1
1..6
# test run summary:
#   total: 6
#   passed: 5
#   failed: 1
#   skipped: 0

Everything before TAP version 14 should go.

In HTML reports, show app, not NodeDriver

Right now we say, for example:

client: imp.ImpInProcessNodeDriver
server: saas.SaasFediverseNodeDriver

or

server: mastodon.MastodonUbosNodeDriver

This should be a representation of Node (e.g. Mastodon), not NodeDriver (e.g. mastodon.MastodonUbosNodeDriver)

Somewhere in the constellation details it can say what driver was used.

Capture metadata for the fediverse server applications we want to test

There may be all sorts of things that we want to capture, and we need to come up with a way to do this. This includes:

  • Platforms on which the server application runs on.
  • Whether it requires HTTPS to communicate with it, even during testing.
  • Which "profiles" it supports, so we don't attempt to run tests that are known to fail. (How to define profiles is a separate issue.)
  • Whether the server application will have certain behaviors or not. Example: a server application that is known to never create ephemeral objects (as defined in the AP spec) can and should be tested to always emit resolvable IDs.

Clearly define how role names in constellations map to parameters in the actual tests

The original plan -- simply matching them by name -- does not work, because a given test plan may run tests from multiple test groups (like WebFinger and ActivityPub) and the role names for the nodes in those test groups is different.

We could go by sequence, but that would make it difficult to include tests in the same test plan that do not use all of the nodes that other tests in the same test plan use. But: is that really a requirement?

Create an implementation of FediverseNode that acts as a gRPC client

Similar to #94, that would enable any Fediverse application, not just those implementing the Mastodon client API, to participate in FediTest automation.

gRPC seems to be the protocol of choice.

We don't really need a server implementation, just know how to get to the generated interfaces in various languages.

Subprocess Driver?

Do you think it would be useful to have a subprocess driver that would support provisioning and unprovision a node using a shell script (or some other program)? This would be a more automated approach than the manual driver but not as full-featured as the UBOS driver. It might also be useful for integration testing of the framework outside of a UBOS container.

Tap reports have a "StringDescription object" in them

Example:

not ok 5 - webfinger.server.4_2__4_do_not_accept_malformed_resource_parameters2
  ---
  problem: |
    TestProblem(test=TestPlanTestSpec(name='webfinger.server.4_2__4_do_not_accept_malformed_resource_parameters2', disabled=None), exc=AssertionError(<hamcrest.core.string_description.StringDescription object at 0x103973e30>))

That StringDescription probably carries a better error message.

Add test run metadata

Add information to test results output with information about a specific test run. I'm thinking of information like: timestamp, platform (O/S and version, etc.), user, and hostname. If we add a version number of the framework, it would be useful to add that as well to determine if a reported problem is because someone is using an old version of the framework that had known issues.

Support different categories of test outcomes

It might be advantageous if the outcome of a test, or a step in a test, could have more categories than just pass/fail, such as:

  • Pass
  • Hard fail: this will create interop problems.
  • Soft fail: this is against the spec, but will probably not generate interop problems.
  • Degrade: content (or metadata) comes across with degraded semantics eg. everything turned into a Note.
  • Not a supported feature.
  • The test itself had a problem.

Maybe the way to implement this is to:

  • Pass: test or test step returns normally.
  • All other cases: an Exception is raised.

The Exceptions come in several flavors:

  • Hamcrest AssertionError and subclasses indicate Hard fail.
  • We create similar exceptions for Hard fail, Soft fail and Degrade.
  • We already have NotImplementedByNodeError for "not supported".
  • All other Exceptions indicate the test itself had a problem.

It might be advantageous to make raising those really concise, along the lines of Hamcrest's assert_that, e.g.

  • hardfail_assert_that
  • softfail_assert_that
  • degrade_assert_that

Implemented like Hamcrest does, which is just a handful of lines: https://github.com/hamcrest/PyHamcrest/blob/main/src/hamcrest/core/assert_that.py

Undefined symbol: check_content_type

In the Imp's perform_webfinger_query, it says:

                    if (
                        not check_content_type
                        or ret_pair.response.content_type() == "application/jrd+json"
                        or ret_pair.response.content_type().startswith(
                            "application/jrd+json;"
                        )
                    ):

Is that supposed to be flag on the method?

Add a time delay after UBOS Node provisioning

We can already specify it in the constellation, I think, but it should be possible for the application / Node to do it as well so the user doesn't have to worry about it. Mastodon apparently needs a few sec after ubos-admin deploy is finished before it returns non-404 webfinger responses.

Test steps not sorted correctly?

inspect.getmembers:

Return all members of an object as (name, value) pairs sorted by name

I'm assuming you want the steps in the order they are defined in the Test class.

Add a run --interactive flag

When the flag is given, and a test fails, feditest needs to stop and ask the user what to do. The options are:

  • n(ext): proceed to the next test in the test plan. This is the same behavior as if no --interactive had been given.
  • a(bort session): stop executing the current test session, shut down its constellation and proceed to run the next test session if there is one in the current test plan
  • q(uit): stop executing the current test session, shut down its constellation and skip all other test sessions that might still be ahead in the test plan
  • r(epeat): repeat the current test

And, if the test is defined as a TestClass:

  • c(continue): continue to the next step in the test, ignoring that the current one failed
  • s(tep): repeat the current test step

The main use case for this is debugging. A test just failed, lets stop and poke around and see what's going on. Perhaps it is easily fixable (say the test setup wasn't entirely correct, and it can be fixed manually) and that saves us to un- and re-provision the constellation.

Test a `Mention` tag

Deliver a Create of a Note with a tag with type Mention and href pointing to a user's actor id. Check that that user receives a notification of the Note. (This is not specified in AP itself.)

Create methods on Node (?) to activate verbose logging for debugging purposes

Destination of the log is tbd.

Addresses the problem that "something" goes wrong while running tests / debugging, and the developer needs more info beyond, say, HTTP 500. Each application is different and may be written in a language/framework the developer is not familiar with so, how to activate logging and where the log goes is not so obvious. We should put it into the Node abstraction.

Related:

Count as skipped if a If a Node or NodeDriver does not implement a control/observation method

Right now, everything that throws an exception is counted as failed, everything that has the "disabled" property set is counted as disabled, and the rest of the tests must have succeeded.

This breaks down when a test, or a Node used by the test, throws a NotImplemented.*Error. The idea is that the TestRun can continue and does not need to be modified, just because a participating Node doesn't provide a method by which the test can control or observe it in a particular way. That will probably be fairly common.

Instead, that situation should count as "skipped" as well. Or maybe it could be its own category, so we can say "N tests couldn't be run because Node / app X does not implement API foo".

Allow pre-defined manually-entered parameters

In the manual Driver, we 1) ask the user questions (such as to specify an account name), 2) make them do things (such as create a post) and 3) enter their observations (such as whether a post has shown up).

Some of the user questions (1) can be answered in advance to running a given test plan. For example, to test WebFinger against existing live instances, we might always want to use the same test account. It should be possible to specify which test accounts to use in advance instead of having to enter it at the console during the test run.

Two ideas:

  • allow answers to such questions to be parameters in the test plan (scoped to the test plan, or scoped to a particular test inside the test plan), or
  • allow answers in a separate parameter file given as a further argument to running a test plan. This alternative has the advantage that the same test plan can run against many nodes without change.

Fix unit tests.

The unit tests are broken after recent code changes (class renaming, etc.). At some point, we should consider adding pre-commit hooks to run unit tests and possible run other tasks like lint checks, etc.

A naming convention for our various kinds of json files?

We have all sorts of JSON files now:

  • Test plans
  • Constellations that can be used to create Test Plans
  • Session templates that can be used to create Test Plans
  • run --json-produced TestRun transcripts

I'm noticing I'm having a bit of difficulties telling from the name of the file what it might be. Particularly, I'm frequently confusing the Test plan and the result of the executed test plan. I did put a type field into the TestRun transcripts, which can help a bit but only after looking at the file, and it's not necessarily at the beginning of the file either.

Should we come up with some kind of naming convention?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.