stevana / quickcheck-state-machine Goto Github PK

View Code? Open in Web Editor NEW

This project forked from advancedtelematic/quickcheck-state-machine

10.0 10.0 11.0 2.89 MB

Test monadic programs using state machine based models

License: Other

Haskell 97.88% Makefile 0.09% TeX 1.91% Nix 0.09% Emacs Lisp 0.04%

quickcheck-state-machine's People

Contributors

Stargazers

Watchers

Forkers

edsko srid sdzx-1 locallycompact the-dr-lazy j-mueller jasagredo jorisdral larrytheliquid erikd renateeilers

quickcheck-state-machine's Issues

Move repetitions of parallel state machine cases upwards in the chain

Some of the recent complications that were added/proposed to QSM were due to the fact that parallel testing will perform multiple repetitions, so what seems to be a unique execution from outside is in fact multiple executions. I am wondering whether this is a good thing. This just leads to a lot of complications, namely:

State machines have to be initialized with WithSetup variants to make sure it happens for each repetition
That state machine's scope cannot escape the execution, therefore we can't check properties on the environment
We provide multiple histories to the prettyParallelCommands function, which then has to do some shenanigans to juggle those executions (some might fail, others don't).

I wonder if all these complications just go away if instead of running the state machine N times in runParallelCommands (10 by default) we would instead generate one case and run 10 times a single execution of the parallel set of commands. I think it should be equivalent in testing power and the execution of the state machine will be isolated, which means that we could get rid of the WithSetup variants, there would be no need for finalCheck and prettyParallelCommands would be simplified.

The only tradeoff I can think of is that we would not be able to provide information on whether some repetitions passed and others failed. But perhaps this could still be done by grouping all the properties?

Missing upper bounds

These packages miss upper bounds: - QuickCheck - aeson - ansi-wl-pprint - array - bifunctors - bytestring - containers - directory - doctest - exceptions - filelock - filepath - generic-data - graphviz - hashable - hashtables - hs-rqlite - http-client - markov-chain-usage-model - matrix - monad-logger - mtl - network - persistent-postgresql - postgresql-simple - pretty-show - process - quickcheck-instances - random - resource-pool - resourcet - servant - servant-client - servant-server - sop-core - split - stm - strict - string-conversions - tasty - tasty-hunit - tasty-quickcheck - text - time - tree-diff - unliftio - unliftio-core - vector - wai - warp Please add them, using cabal gen-bounds for suggestions. For more information see: https://pvp.haskell.org/

GHC 9.2.1 compatibility

Hi, I assume this is to do with 9.2.1, but I can't see what has changed exactly.

[ 2 of 21] Compiling Test.StateMachine.Lockstep.Auxiliary ( src/Test/StateMachine/Lockstep/Auxiliary.hs, dist/build/Test/State>

src/Test/StateMachine/Lockstep/Auxiliary.hs:64:13: error:
    • Could not deduce: (Top a => Elem xs a -> g a -> m (h a))
                        ~ (Elem xs a -> g a -> m (h a))
      from the context: (NTraversable f, Applicative m, SListI xs)
        bound by the type signature for:
                   ntraverse :: forall {k} (f :: (k -> *) -> [k] -> *) (m :: * -> *)
                                       (xs :: [k]) (g :: k -> *) (h :: k -> *).
                                (NTraversable f, Applicative m, SListI xs) =>
                                (forall (a :: k). Elem xs a -> g a -> m (h a))
                                -> f g xs -> m (f h xs)
        at src/Test/StateMachine/Lockstep/Auxiliary.hs:(61,1)-(63,33)
      Expected: (forall (a :: k). Elem xs a -> g a -> m (h a))
                -> f g xs -> m (f h xs)
        Actual: (forall (a :: k). Top a => Elem xs a -> g a -> m (h a))
                -> f g xs -> m (f h xs)
    • In the expression: nctraverse (Proxy @Top)
      In an equation for ‘ntraverse’: ntraverse = nctraverse (Proxy @Top)
    • Relevant bindings include
        ntraverse :: (forall (a :: k). Elem xs a -> g a -> m (h a))
                     -> f g xs -> m (f h xs)
          (bound at src/Test/StateMachine/Lockstep/Auxiliary.hs:64:1)
   |
64 | ntraverse = nctraverse (Proxy @Top)
   |             ^^^^^^^^^^^^^^^^^^^^^^^

Release QSM 0.9.0 in Hackage

We have adopted the no-vendored-treediff sublibrary in our test-suites (see here using this adaptor module).

However, we can't release our code until QSM 0.9.0 is released. We can upload a temporal fork-ish version to CHaP with version number 0.8.0.0.0.0.0.1 (like we have done with persistent for example see) until we get Hackage's 0.9.0 but if you were planning to do a release soon we might as well just avoid doing that and depend on 0.9.0 from Hackage directly.

Making a release with monadic initialization

The way one defines state machines after #12 has changed to requiring that the initialization is monadic. Could a new release of quickcheck-state-machine be published with these changes to Hackage? We would like to get rid of source-repository-packages in our project 😄

quickcheck-state-machine-0.8.0 test fail to compile

/home/curators/work/unpack-dir/unpacked/quickcheck-state-machine-0.8.0-c332697455549939e501aea2fc1681fa710d03d5ae2326b49ccfc60c52d0b72e/test/SQLite.hs:463:37: error:
           Ambiguous occurrence ‘open'’                                                                                            It could refer to
              either ‘Database.Sqlite.open'’,                                                                                                imported from ‘Database.Sqlite’ at test/SQLite.hs:(49,1)-(50,25)                                                     or ‘SQLite.open'’, defined at test/SQLite.hs:473:1                                                               |                                                                                                                   463 |     liftIO $ asyncQueueBound 1000 $ open' str logFunc                                                                 |                                     ^^^^^

Disable them during stackage build for now.

Change name of `forall` to something else in anticipation for upcoming GHC change

The GHC Proposal 281 “Visible forall” proposes to make ‘forall’ a keyword, so it will no longer be a valid identifier. Library authors are advised to use ‘forAll’, ‘forall_’, or any other name instead. The proposal is currently underconsideration by the GHC Steering Committee and is recommended for acceptance.

Gating upstream `tree-diff` behind an opt-in flag

It turns out that a lot of code (at least for us in IOG) depends on the hackage tree-diff, and we have lots of instances for it. I see there is an issue with the licenses, perhaps because you may want QSM to be usable on closed-source environments?

However how would you feel about gating the dependency behind a flag, such that users of QSM could decide whether they want to use the upstream tree-diff (and be GPL compliant) or the in-tree one? It would basically consist on:

flag vendored-tree-diff
  default: True
  manual: True

library 
  ...
  if flag(vendored-tree-diff)
  then
    exposed-modules: 
      Test.StateMachine.TreeDiff
      Test.StateMachine.TreeDiff.Class
      Test.StateMachine.TreeDiff.Expr
      Test.StateMachine.TreeDiff.List
      Test.StateMachine.TreeDiff.Pretty
      Test.StateMachine.TreeDiff.Tree
    build-depends:
      ... -- dependencies of tree-diff
  else
    build-depends:
      tree-diff
    mixins:
      tree-diff (Data.TreeDiff.Class as Test.StateMachine.TreeDiff, ...)

This way QSM would remain BSD-3 but users who want to keep up with the upstream tree-diff can still opt-in for it. It kind of emulates the way it works in GHC with gmp and bignum.

I have not tried the above, but I suspect there would be a way to make it work.

@stevana (pinging because it is a question directly for you more than an issue to solve eventually)

Dot drawing is misaligned

When working on some tests I was trying to understand the flow of execution by inspecting the Dot output of the test. However this is misleading, and in fact wrong (or I have erroneous expectations). The terminal output I get is the following (cropped to make it smaller):

┌─────────────────────────────────────────────────────────────────────────────────┐
│ [] ← Re15 [App "_\215_" [App "\"aa80ae11056b9ee101ddb0db7f03                    │
│                                                                 → App "Void" [] │
└─────────────────────────────────────────────────────────────────────────────────┘
                                           │ ┌────────────────────────────────────┐
                                           │ │ [] ← App "TryAd                    │
                                           │ │                    → App "Void" [] │
                                           │ └────────────────────────────────────┘
                                           │ ┌────────────────────────────────────┐
                                           │ │ [] ← App "SyncL                    │
                                           │ │                    → App "Void" [] │
                                           │ └────────────────────────────────────┘
┌────────────────────────────────────────┐ │                                       
│ [] ← Re_"Fork",App "IsReachable" [])]) │ │                                       
│                        → App "Void" [] │ │                                       
└────────────────────────────────────────┘ │
┌────────────────────────────────────────┐ │                                       
│ [] ← Re                                │ │                                       
│                        → App "Void" [] │ │                                       
└────────────────────────────────────────┘ │
┌────────────────────────────────────────┐ │                                       
│ [] ← Ap                                │ │                                       
│                        → App "Void" [] │ │                                       
└────────────────────────────────────────┘ │                                       
                                           │ ┌────────────────────────────────────┐
                                           │ │ [] ← App "GetSn                    │
                                           │ │ → Rec "GotSnaps[])]),App "1" []]])]) │
                                           │ └────────────────────────────────────┘

where inspecting the ParallelCommands outputted by QuickCheck, I can identify the following parallel Pairs:

┌─────────────────────────────────────────────────────────────────────────────────┐
│ [] ← Re15 [App "_\215_" [App "\"aa80ae11056b9ee101ddb0db7f03                    │
│                                                                 → App "Void" [] │
└─────────────────────────────────────────────────────────────────────────────────┘
===================================================================================
                                           │ ┌────────────────────────────────────┐
                                           │ │ [] ← App "TryAd                    │
                                           │ │                    → App "Void" [] │
                                           │ └────────────────────────────────────┘
===================================================================================
                                           │ ┌────────────────────────────────────┐
                                           │ │ [] ← App "SyncL                    │
                                           │ │                    → App "Void" [] │
                                           │ └────────────────────────────────────┘
┌────────────────────────────────────────┐ │                                       
│ [] ← Re_"Fork",App "IsReachable" [])]) │ │                                       
│                        → App "Void" [] │ │                                       
└────────────────────────────────────────┘ │
===================================================================================
┌────────────────────────────────────────┐ │                                       
│ [] ← Re                                │ │                                       
│                        → App "Void" [] │ │                                       
└────────────────────────────────────────┘ │
===================================================================================
┌────────────────────────────────────────┐ │                                       
│ [] ← Ap                                │ │                                       
│                        → App "Void" [] │ │                                       
└────────────────────────────────────────┘ │                                       
                                           │ ┌────────────────────────────────────┐
                                           │ │ [] ← App "GetSn                    │
                                           │ │ → Rec "GotSnaps[])]),App "1" []]])]) │
                                           │ └────────────────────────────────────┘

However the Dot output is this one, where I have manually drawn the parallel Pairs:

I would have expected nodes from different Pairs to not appear at the same height. Were my expectations wrong?

Generalize Lockstep to allow many-to-many relationship between real and mock handles

Currently the lockstep infrastructure insists on a one-to-one mapping between real handles (used in the system under test) and mock handles (used in the model); we maintain this mapping as the test executes (Refs, Refss). This is fine for most tests, but not for all. Suppose we are testing a database, and suppose that that database might reuse primary keys under certain circumstances. Suppose furthermore that we want to treat the exact value of the primary key as an implementation detail -- we do not want to insist that the model chooses precisely the same primary keys as the system does (see also 7f1d939), and we must therefore use handles to represent them. Now if the model does not reuse primary keys, but the real system does, we no longer have a one-to-one mapping between these two.

The lockstep infrastructure uses a single post-condition: the system under test and the model should give the same response, modulo handles. The way that it currently does this is by translating a response in terms of real handles to a response in terms of mock handles (using the 1-1 mapping), and then do an equality check. The key idea that @IsaacR24 and I explored was this: we don't really need to check that the handles match. After all, if they are new handles, then the match is trivially satisfied, and if they existing handles, then a mismatch in handles should be detectable by a later command that uses those handles. Thus, we could check whether there is some translation of handles that would make the two responses match.

Indeed, we could forgo this altogether -- after all, it's anyway only a best effort to try to detect the mismatch a bit earlier. We could instead translate both responses to contain no handles at all, and then compare those. That should still be fine, in terms of what we are testing.

That was the theory, but actually it does not currently work in in practice. To see why, consider this function that QSM demands we implement:

postcondition :: model Concrete -> cmd Concrete -> resp Concrete -> Logic

In order to implement this postcondition for the lockstep philosophy, we need to compute the corresponding mock response, and compare. We can do that, because we can translate all those concrete references into mock references using our 1-1 mapping, and then execute the mock semantics. However, if we no longer have that 1-1 mapping, this is no longer an option.
I suppose we might try to see if any mapping gives us a corresponding response, but that feels rather unpleasant. It's also not the only place where this is a problem; indeed, function lockstep, so central to the whole approach, would not be definable with this generalization. Searching for some mapping that works might work okay for postcondition, but for example if we were going through an entire history of commands, then this may well lead to combinatorial explosion.

I don't think that this problem is unsolvable, but it requires some rethinking, and I think it almost certainly requires some changes to the underlying QSM abstractions. Specifically, functions like postcondition are omitting information: when QSM executed the test, it generates commands in terms of symbolic references, before translating them to concrete references -- but we do not have the symbolic references anymore here. If we did, then the ambiguity would disappear; in both the system under test and the model, a symbolic reference is unambiguous: the reference returned there ; it then doesn't matter anymore if the relation between real handles and mock handles isn't 1-1 anymore:

Mock                             | Real 
-------------------------------------------------------------------
Cmd1: returns MockHandle1 (Ref1) | Cmd1: returns RealHandle1 (Ref1)
Cmd2: returns MockHandle2 (Ref2) | Cmd2: returns RealHandle1 (Ref2)
Cmd3: Use Ref1                   | Cmd3: Use Ref1

It doesn't matter now that RealHandle1 might correspond to MockHandle1 or MockHandle2: the symbolic reference resolves the ambiguity.

So I think this is solvable. I took a look, but I think changing this so that we get the information we need out of QSM would require changes in quite a few places. Nothing terribly difficult, at first glance, but I do not currently have enough time to look at this. So for now just this detailed description. Something to consider in the future.

RQLite tests fail on current master

The failure is as follows:

Tests
  Rqlite
    parallel: 
┌┐
│ Spawn 0 1s 1s Nothing │
│ → user error (openTCPConnection: failed to connect to 172.19.0.10:4001: Network.Socket.connect: <socket: 103>: does not exist (Connection refused)) │
└┘

AnnotateC "ExceptionThrown \"user error (openTCPConnection: failed to connect to 172.19.0.10:4001: Network.Socket.connect: <socket: 103>: does not exist (Connection refused))\"" BotC

FAIL (11.65s)
      *** Failed! Falsified (after 1 test and 4 shrinks):
      ParallelCommands
        { prefix =
            Commands
              { unCommands =
                  [ Command
                      Spawn
                      0
                      1
                      s
                      1
                      s
                      Nothing
                      Resp
                      (Right (Spawned (Reference (Symbolic (Var 0)))))
                      [ Var 0 ]
                  ]
              }
        , suffixes = []
        }
      Use --quickcheck-replay=166065 to reproduce.

Reenable ProcessRegistry test

The ProcessRegistry test was disabled in #25, when the Markov module was removed. It can be reenabled again by reimplementing the generator (without Markov).

Missing opportunities for shrinking

I have the suspicion that there are test cases which could be shrunk even more than currently the parallel state machine does, just by removing empty pairs and one-sided pairs.

In my tests I'm using a simple shrinker that don't offer alternatives for any commands, so I expect this to shrink only the original sequence of commands, by removing commands or moving them to the prefix.

shrinker _ _ = []

However, the final counterexample often contains empty parallel groups:

  , suffixes =
      [ Pair
          { proj1 = Commands { unCommands = [] }
          , proj2 = Commands { unCommands = [] }
          }

which should be shrunk, and it also contains "one-side" pairs:

      , Pair
          { proj1 = Commands { unCommands = [] }
          , proj2 =
              Commands
                { unCommands =
                    [ Command
                        ...
                        []
                    ]
                }
          }

In particular I'm seeing now a test case with the following shrunk sequence of commands:

*** Failed! Falsified (after 67 tests and 22 shrinks):     
ParallelCommands
  { prefix =
      Commands
        { unCommands =
            [ Command
                ...
                ...
                []
            , Command
                ...
                ...
                []
            , Command ... ... []
            , Command
                ...
                ...
                []
            ]
        }
  , suffixes =                             
      [ Pair             -- empty pair
          { proj1 = Commands { unCommands = [] }
          , proj2 = Commands { unCommands = [] }
          }
      , Pair            -- left sided pair
          { proj1 =
              Commands
                { unCommands =
                    [ Command
                        ...
                        ...
                        []
                    , Command ... ... []
                    ]
                }
          , proj2 = Commands { unCommands = [] }
          }
      , Pair            -- right sided pair
          { proj1 = Commands { unCommands = [] }
          , proj2 =
              Commands
                { unCommands =
                    [ Command
                        ...
                        ...
                        []
                    ]
                }
          }
      ]
  }

where everything happens (or should happen) sequentially and therefore it would only make sense to shrink it by moving everything into the sequential prefix.

Add option to test properties on isolated environment

PR #12 introduced withSetup variants to run commands on monadic state machines, so that environments could be isolated between different parallel executions of a state machine.

sm :: m (StateMachine model cmd m resp)
sm = do
  env <- ...
  pure $ StateMachine {
	...
	semantics = semantics env
	...
}

I'd like to test properties concerning the state of the environment at the end of every run. To do so, I propose to add a field finalPostcondition :: Property (alternative naming suggestions are welcome) to StateMachine. We can then return this property at the end of executing commands, e.g. the return type of executeNParallelCommands would go from m (History cmd resp, model Concrete, Reason) to m (History cmd resp, model Concrete, Reason, Property). Then it could be propagated via runNParallelCommandsWithSetup and relevant variants to prettyNParallelCommands where a sensible message could be printed upon failure.

Feedback would be highly appreciated!

(Thanks @jasagredo for the idea)