fediverse-devnet / feditest Goto Github PK

View Code? Open in Web Editor NEW

30.0 1.0 5.0 6.48 MB

A testing framework for distributed, heterogeneous systems communicating with complex protocols, such as the Fediverse

Home Page: https://feditest.org/

License: MIT License

Python 89.39% Makefile 0.53% Jinja 6.03% CSS 4.04%

feditest's Introduction

Feditest: test federated protocols such as those in the Fediverse

This repo contains:

the FediTest test framework

which allows you to define and run test plans that involve constellations of servers (like Fediverse instances) whose communication you want to test.

The actual tests for the Fediverse are in their own repository.

For more details, check out feditest.org and find us on Matrix in #fediverse-testing:matrix.org.

Found a bug? You must be kidding; like in all of Arch Linux, there are no bugs in this software. But if we happen to be wrong, submit a bug report with as much detail as possible right here for this project on Github.

feditest's People

Contributors

Stargazers

Watchers

Forkers

pfefferle mexon steve-bate jernst lnceballosz

feditest's Issues

Test that applications return descriptive error messages

(from this discussion: https://mastodon.social/@benpate/111161976484128397 )

Just this: there are tons of ways that an API call can fail, but most Fediverse software just returns something unhelpful, like 500 Internal Server Error.

For production software, that’s probably reasonable, but a well-written test suite should help people debug, not just tell them about the bug.

So if I have a voice, a test suite should really return errors like: “this message was not accepted because it is missing a ‘name’ field.”

Not sure what exactly we can do about this here, but let's record it anyway.

Also the follow-up in the thread:

Nomad returns delivery reports indicating the disposition of a message once it hits the server and let's normal people examine the reports for any message they send. We needed this years ago because Diaspora silently dropped 1/3 of all communications and we needed to be able to prove the issue wasn't our software's fault. The assumption was that site operators of that period were often high school kids that couldn't read or interpret log files, so this let us find federation issues without involving them.

It would be no small feat getting this kind of thing standardised and adopted in ActivityPub, but we do provide feedback of the http error code for all sites -- and if something is stuck in the queue, one can see the results of the individual delivery attempts. Nomad sites provide much more detailed answers of what happened - lack of permission, filter/blocking rule, duplicate, recipient not found, delivered, etc...

Create a full test run transcript in JSON

Right now, we have to re-run a TestRun to format a report differently. That's not a good idea given how expensive TestRuns are. Also, reports generally don't show all information that was collected during a TestRun, and if a user decides they needed more detail than a particular report shows, they have to re-run the TestPlan.

Instead:

During a TestRun, silently log everything.
Create an optional output format that dumps all that was recorded into a JSON file.
Create a sub-command that can parse such a JSON log and generate other forms of report from it, without re-running the tests.

There's also the advantage of being able to run several different reports from the same TestRun, like a web page and a summary.

Create a pluggable error reporter with a single default implementation

Maybe the default implementation can emit TAP per #39.

Consider using asyncio for performance.

The test durations are dominated by HTTP response delays. Making the tests asynchronous could dramatically speed up the test plan execution.

Complete implementation of fallback manual driver for WebFinger and Follow test cases

This becomes the template for all other cases.

NodeDriver is missing a description

So we can't print one in info --nodedriver. It should be generated similarly to what Test and TestStep do.

Add GRPC Node Driver

Create an implementation of FediverseNode that delegates to the Mastodon client API

Let's call this FediverseNode subclass MastodonClientApiNode.

The first scenario would be something like this:

Either modify MastodonUbosNodeDriver or create a version of SaasFediverseNodeDriver whose _provision_node instantiates MastodonClientApiNode.
In MastodonClientApiNode, override make_create_note and use the Mastodon client API to actually make a create activity with a Note when invoked and return the URL of the created activity.
Run a simplified version of DeliverToInboxTest to invoke create_note from a test.

If we can do this, we know we can do the rest of what this MastodonClientApiNode will have to do, too.

Some notes:

NodeDriver._provision_node is being handed the hostname in its parameters if specified in the TestPlan file.
How to get an oauth token is tbd. Is there a way to do this that:
1. can be performed from behind the firewall, against a public Mastodon instance, without another cooperating public website? At all?
2. ... and also can be scripted requiring no human intervention?
3. If not, can it be automated against a locally running instance? (e.g. by invoking a script that runs a query against the Mastodon database; assume feditest has access to all relevant installation and db data including credentials)

Can you clarify "disabled" vs "skipped"?

It looks like disabled doesn't count as skipped now. Is "skipped" only related to "non-implementation"? Or?

Emit the line number when there is a problem

Right now we have to guess where in the test the problem occurred.

Tap report should only show tap, not the exceptions

E.g. currently:

% cd feditest-tests-sandbox jernst$ ../feditest/venv/bin/feditest run --tap
2024-05-01T17:08:52Z [ERROR] feditest: FAILED test assertion: sandbox.example_test_with_functions::example_test2: 
Expected: <-56>
     but: was <0>
 Traceback (most recent call last):

  File "/Users/jernst/git/github.com/fediverse-devnet/feditest/venv/lib/python3.12/site-packages/feditest/__init__.py", line 91, in run
    self.test_function(**args)

  File "/Users/jernst/git/github.com/fediverse-devnet/feditest-tests-sandbox/tests/sandbox/example_test_with_functions.py", line 51, in example_test2
    assert_that(c, equal_to(-56))

  File "/Users/jernst/git/github.com/fediverse-devnet/feditest/venv/lib/python3.12/site-packages/hamcrest/core/assert_that.py", line 58, in assert_that
    _assert_match(actual=actual_or_assertion, matcher=matcher, reason=reason)

  File "/Users/jernst/git/github.com/fediverse-devnet/feditest/venv/lib/python3.12/site-packages/hamcrest/core/assert_that.py", line 73, in _assert_match
    raise AssertionError(description)

AssertionError: 
Expected: <-56>
     but: was <0>


TAP version 14
# test plan: None
# session: session_0
# constellation: A_vs_1
#   name: A_vs_1
#   roles:
#     - name: client
#       driver: sandbox.SandboxMultClientDriver_ImplementationA
#     - name: server
#       driver: sandbox.SandboxMultServerDriver_Implementation1
ok 1 - sandbox.example_test_with_functions::example_test1
ok 2 - sandbox.example_test_with_functions::example_test2
ok 3 - sandbox.example_test_with_classes::ExampleTest1
# session: session_1
# constellation: A_vs_2
#   name: A_vs_2
#   roles:
#     - name: client
#       driver: sandbox.SandboxMultClientDriver_ImplementationA
#     - name: server
#       driver: sandbox.SandboxMultServerDriver_Implementation2Faulty
ok 4 - sandbox.example_test_with_functions::example_test1
not ok 5 - sandbox.example_test_with_functions::example_test2
  ---
  problem: |
    Expected: <-56>
         but: was <0>
  ...
ok 6 - sandbox.example_test_with_classes::ExampleTest1
1..6
# test run summary:
#   total: 6
#   passed: 5
#   failed: 1
#   skipped: 0

Everything before TAP version 14 should go.

In HTML reports, show app, not NodeDriver

Right now we say, for example:

client: imp.ImpInProcessNodeDriver
server: saas.SaasFediverseNodeDriver

server: mastodon.MastodonUbosNodeDriver

This should be a representation of Node (e.g. Mastodon), not NodeDriver (e.g. mastodon.MastodonUbosNodeDriver)

Somewhere in the constellation details it can say what driver was used.

Define and implementation approach to app metadata

We need to capture which capabilities an application actually supports, so we don't try to test things that the application does not claim to do.

This relates to FEP-9fde: Mechanism for servers to expose supported operations.

Perhaps we could implement something like that as proxy objects for those applications that don't provide this metadata (currently all of them?) but that then could be removed if/when support shows up in applications.

Capture metadata for the fediverse server applications we want to test

There may be all sorts of things that we want to capture, and we need to come up with a way to do this. This includes:

Platforms on which the server application runs on.
Whether it requires HTTPS to communicate with it, even during testing.
Which "profiles" it supports, so we don't attempt to run tests that are known to fail. (How to define profiles is a separate issue.)
Whether the server application will have certain behaviors or not. Example: a server application that is known to never create ephemeral objects (as defined in the AP spec) can and should be tested to always emit resolvable IDs.

Clearly define how role names in constellations map to parameters in the actual tests

The original plan -- simply matching them by name -- does not work, because a given test plan may run tests from multiple test groups (like WebFinger and ActivityPub) and the role names for the nodes in those test groups is different.

We could go by sequence, but that would make it difficult to include tests in the same test plan that do not use all of the nodes that other tests in the same test plan use. But: is that really a requirement?

Consider reporting using TAP

https://testanything.org/

/cc @evanp

Create a standard readme when sufficiently stable

See https://github.com/RichardLitt/standard-readme

Create an implementation of FediverseNode that acts as a gRPC client

Similar to #94, that would enable any Fediverse application, not just those implementing the Mastodon client API, to participate in FediTest automation.

gRPC seems to be the protocol of choice.

We don't really need a server implementation, just know how to get to the generated interfaces in various languages.

Subprocess Driver?

Do you think it would be useful to have a subprocess driver that would support provisioning and unprovision a node using a shell script (or some other program)? This would be a more automated approach than the manual driver but not as full-featured as the UBOS driver. It might also be useful for integration testing of the framework outside of a UBOS container.

Non-tap output is missing context information when emitting an error

Missing:

Test plan
Session
Name of the test (should be explicit, not just in the stack trace)

Tap reports have a "StringDescription object" in them

Example:

not ok 5 - webfinger.server.4_2__4_do_not_accept_malformed_resource_parameters2
  ---
  problem: |
    TestProblem(test=TestPlanTestSpec(name='webfinger.server.4_2__4_do_not_accept_malformed_resource_parameters2', disabled=None), exc=AssertionError(<hamcrest.core.string_description.StringDescription object at 0x103973e30>))

That StringDescription probably carries a better error message.

Add test run metadata

Add information to test results output with information about a specific test run. I'm thinking of information like: timestamp, platform (O/S and version, etc.), user, and hostname. If we add a version number of the framework, it would be useful to add that as well to determine if a reported problem is because someone is using an old version of the framework that had known issues.

ACCT_REGEX doesn't allow uppercase ALPHA characters

Is that intentional? I don't see any requirement for it in the RFCs but maybe I missed something. Mastodon fails the JRD subject test because of this.

Create Nextcloud + Social App UBOS setup

Needs to a local constellation, and whatever Site Json or backupfile we need.

Support different categories of test outcomes

It might be advantageous if the outcome of a test, or a step in a test, could have more categories than just pass/fail, such as:

Pass
Hard fail: this will create interop problems.
Soft fail: this is against the spec, but will probably not generate interop problems.
Degrade: content (or metadata) comes across with degraded semantics eg. everything turned into a Note.
Not a supported feature.
The test itself had a problem.

Maybe the way to implement this is to:

Pass: test or test step returns normally.
All other cases: an Exception is raised.

The Exceptions come in several flavors:

Hamcrest AssertionError and subclasses indicate Hard fail.
We create similar exceptions for Hard fail, Soft fail and Degrade.
We already have NotImplementedByNodeError for "not supported".
All other Exceptions indicate the test itself had a problem.

It might be advantageous to make raising those really concise, along the lines of Hamcrest's assert_that, e.g.

hardfail_assert_that
softfail_assert_that
degrade_assert_that

Implemented like Hamcrest does, which is just a handful of lines: https://github.com/hamcrest/PyHamcrest/blob/main/src/hamcrest/core/assert_that.py

Undefined symbol: check_content_type

In the Imp's perform_webfinger_query, it says:

                    if (
                        not check_content_type
                        or ret_pair.response.content_type() == "application/jrd+json"
                        or ret_pair.response.content_type().startswith(
                            "application/jrd+json;"
                        )
                    ):

Is that supposed to be flag on the method?

Don't print exceptions when run without -v

Which test failed why is good enough, we don't need all the exceptions unless the user requested that level of detail with -v.

Correct message on first interactive prompt

Before it even runs anything, it says:

Interactive: Which TestSession to run next? n(ext session), r(repeat just completed session), q(uit):

Summary statistics for an executed test plan

Would be nice if it emitted something like:

N tests total
N tests passed
N tests failed
N tests skipped
in some suitable format, both for --tap and non-tap mode.

Add a time delay after UBOS Node provisioning

We can already specify it in the constellation, I think, but it should be possible for the application / Node to do it as well so the user doesn't have to worry about it. Mastodon apparently needs a few sec after ubos-admin deploy is finished before it returns non-404 webfinger responses.

Test steps not sorted correctly?

inspect.getmembers:

Return all members of an object as (name, value) pairs sorted by name

I'm assuming you want the steps in the order they are defined in the Test class.

Define an annotation mechanism for tests to describe what capabilities they test

Perhaps it could be an argument to @step, or of the entire test. There is also a question how this relates to the subtyping of Node that we are doing.

In HTML reports, use CSS tags prefixed with feditest-

That makes it easy to include FediTest reports in other websites without CSS conflicts.

`—tap` with no argument does not work

The default for --template causes the related tap guard to print usage and exit. (mentioned in PR review)

Do not generate unneeded rolemappings

E.g. there is no point for generate-testplan to emit:

            "rolemapping": {
                "client": "client",
                "server": "server"
            },

Create a reporter that emits HTML output that can be easily published on feditest.org

Maybe a JSON file that can be processed by hugo and included in web pages with a custom hugo tag?

Add a run --interactive flag

When the flag is given, and a test fails, feditest needs to stop and ask the user what to do. The options are:

n(ext): proceed to the next test in the test plan. This is the same behavior as if no --interactive had been given.
a(bort session): stop executing the current test session, shut down its constellation and proceed to run the next test session if there is one in the current test plan
q(uit): stop executing the current test session, shut down its constellation and skip all other test sessions that might still be ahead in the test plan
r(epeat): repeat the current test

And, if the test is defined as a TestClass:

c(continue): continue to the next step in the test, ignoring that the current one failed
s(tep): repeat the current test step

The main use case for this is debugging. A test just failed, lets stop and poke around and see what's going on. Perhaps it is easily fixable (say the test setup wasn't entirely correct, and it can be fixed manually) and that saves us to un- and re-provision the constellation.

Test a `Mention` tag

Deliver a Create of a Note with a tag with type Mention and href pointing to a user's actor id. Check that that user receives a notification of the Note. (This is not specified in AP itself.)

If ubos-admin does not exist, fall back from UBOS Driver to manual Driver

Maybe with a nice warning message.

Test when NodeManager is first accessed, and return a different NodeManager instead

Make the Imp User Agent feditest/<version>

This will hopefully get around people blocking the default Python user agent.
And we can document that and ask implementors to not block this user agent.

In HTML reports, show more run-related data

When the test was run
What platform it was run on
Version of feditest

Create methods on Node (?) to activate verbose logging for debugging purposes

Destination of the log is tbd.

Addresses the problem that "something" goes wrong while running tests / debugging, and the developer needs more info beyond, say, HTTP 500. Each application is different and may be written in a language/framework the developer is not familiar with so, how to activate logging and where the log goes is not so obvious. We should put it into the Node abstraction.

Automatically configure networking for a constellation

Including DNS and custom CA.

Count as skipped if a If a Node or NodeDriver does not implement a control/observation method

Right now, everything that throws an exception is counted as failed, everything that has the "disabled" property set is counted as disabled, and the rest of the tests must have succeeded.

This breaks down when a test, or a Node used by the test, throws a NotImplemented.*Error. The idea is that the TestRun can continue and does not need to be modified, just because a participating Node doesn't provide a method by which the test can control or observe it in a particular way. That will probably be fairly common.

Instead, that situation should count as "skipped" as well. Or maybe it could be its own category, so we can say "N tests couldn't be run because Node / app X does not implement API foo".

`obtain_account_identifier` creates incorrect URI

New validation method returns a string. Previous method returned a tuple. (mentioned in PR code review)

Emit hamcrest StringDescription object in problems reported through tap as string

Example:

TestProblem(test=TestPlanTestSpec(name='webfinger.test_server_4_3_server_only_returns_jrd_in_response_to_https_requests', disabled=None), exc=AssertionError(<hamcrest.core.string_description.StringDescription object at 0xffff957041d0>))

The <hamcrest.core.string_description.StringDescription object at 0xffff957041d0> : it has a tostring() per https://pyhamcrest.readthedocs.io/en/latest/core.html

Allow pre-defined manually-entered parameters

In the manual Driver, we 1) ask the user questions (such as to specify an account name), 2) make them do things (such as create a post) and 3) enter their observations (such as whether a post has shown up).

Some of the user questions (1) can be answered in advance to running a given test plan. For example, to test WebFinger against existing live instances, we might always want to use the same test account. It should be possible to specify which test accounts to use in advance instead of having to enter it at the console during the test run.

Two ideas:

allow answers to such questions to be parameters in the test plan (scoped to the test plan, or scoped to a particular test inside the test plan), or
allow answers in a separate parameter file given as a further argument to running a test plan. This alternative has the advantage that the same test plan can run against many nodes without change.

Fix unit tests.

The unit tests are broken after recent code changes (class renaming, etc.). At some point, we should consider adding pre-commit hooks to run unit tests and possible run other tasks like lint checks, etc.

Test code not invoking super methods correctly

To invoke a base class method, it must be invoked using super().my_method rather than super.my_method.

A naming convention for our various kinds of json files?

We have all sorts of JSON files now:

Test plans
Constellations that can be used to create Test Plans
Session templates that can be used to create Test Plans
run --json-produced TestRun transcripts

I'm noticing I'm having a bit of difficulties telling from the name of the file what it might be. Particularly, I'm frequently confusing the Test plan and the result of the executed test plan. I did put a type field into the TestRun transcripts, which can help a bit but only after looking at the file, and it's not necessarily at the beginning of the file either.

Should we come up with some kind of naming convention?