ocaml-multicore / dscheck Goto Github PK

Experimental model checker for testing concurrent algorithms

License: ISC License

OCaml 99.17% Shell 0.83%

dscheck's Introduction

📢 Note 📢

Multicore OCaml project has now been merged into OCaml 🎉. This repository is no longer developed or maintained. Please follow the updates at the OCaml Github repository.

Citation

If you are citing this work in an academic paper, please cite the ICFP 2020 paper "Retrofitting Parallelism onto OCaml": https://dl.acm.org/doi/10.1145/3408995

Branch `trunk`	Branch `4.14`	Branch `4.13`	Branch `4.12`

README

Overview

OCaml is a functional, statically-typed programming language from the ML family, offering a powerful module system extending that of Standard ML and a feature-rich, class-based object system.

OCaml comprises two compilers. One generates bytecode which is then interpreted by a C program. This compiler runs quickly, generates compact code with moderate memory requirements, and is portable to many 32 or 64 bit platforms. Performance of generated programs is quite good for a bytecoded implementation. This compiler can be used either as a standalone, batch-oriented compiler that produces standalone programs, or as an interactive REPL system.

The other compiler generates high-performance native code for a number of processors. Compilation takes longer and generates bigger code, but the generated programs deliver excellent performance, while retaining the moderate memory requirements of the bytecode compiler. The native-code compiler currently runs on the following platforms:

	Tier 1 (actively maintained)	Tier 2 (maintained when possible)
x86 64 bits	Linux, macOS, Windows, FreeBSD	NetBSD, OpenBSD
x86 32 bits	Linux, Windows	FreeBSD, NetBSD, OpenBSD
ARM 64 bits	Linux, macOS	FreeBSD
ARM 32 bits	Linux	FreeBSD, NetBSD, OpenBSD
Power 64 bits	Linux
Power 32 bits	Linux
RISC-V 64 bits	Linux
IBM Z (s390x)	Linux

Other operating systems for the processors above have not been tested, but the compiler may work under other operating systems with little work.

Copyright

All files marked "Copyright INRIA" in this distribution are Copyright © 1996-2021 Institut National de Recherche en Informatique et en Automatique (INRIA) and distributed under the conditions stated in file LICENSE.

Installation

See the file INSTALL.adoc for installation instructions on machines running Unix, Linux, macOS, WSL and Cygwin. For native Microsoft Windows, see README.win32.adoc.

Documentation

The OCaml manual is distributed in HTML, PDF, and Emacs Info files. It is available at

https://ocaml.org/releases/latest/manual.html

Availability

The complete OCaml distribution can be accessed at

https://ocaml.org/docs/install.html

Keeping in Touch with the Caml Community

There is an active and friendly discussion forum at

https://discuss.ocaml.org/

The OCaml mailing list is the longest-running forum for OCaml users. You can email it at

[email protected]

You can subscribe and access list archives via the Web interface at

https://sympa.inria.fr/sympa/subscribe/caml-list

An alternative archive of the mailing list is also available at

https://inbox.ocaml.org/

There also exist other mailing lists, chat channels, and various other forums around the internet for getting in touch with the OCaml and ML family language community. These can be accessed at

https://ocaml.org/community/

In particular, the IRC channel #ocaml on Libera has a long history and welcomes questions.

Bug Reports and User Feedback

Please report bugs using the issue tracker at https://github.com/ocaml/ocaml/issues

To be effective, bug reports should include a complete program (preferably small) that exhibits the unexpected behavior, and the configuration you are using (machine type, etc).

For information on contributing to OCaml, see HACKING.adoc and CONTRIBUTING.md.

Separately maintained components

Some libraries and tools which used to be part of the OCaml distribution are now maintained separately. Please use the issue trackers at their respective new homes:

The Graphics library (removed in OCaml 4.09)
The Num library (removed in OCaml 4.06)
The OCamlbuild tool (removed in OCaml 4.03)
The camlp4 tool (removed in OCaml 4.02)
The LablTk library (removed in OCaml 4.02)
The CamlDBM library (removed in OCaml 4.00)
The OCamlWinTop Windows toplevel (removed in OCaml 4.00)

dscheck's People

Contributors

Stargazers

Watchers

Forkers

sudha247 lyrm art-w bartoszmodelski talex5

dscheck's Issues

Add a way to skip execution of traces

When debugging a failing case produced by qcheck-lin, it would be nice if dscheck could explore only the interleavings that produce the same intermediate values as reported by qcheck-lin (and skip the other execution traces that are less likely to have a bug). I think it would be enough to raise a specific exception to indicate that we don't need to continue exploring that way: (but not report it as a failure)

if value <> expected then raise Dscheck.Skip ; (* ... *)

(It almost works if the user does the try...with Skip -> () on their side, because the final postcondition is still called)

Record Atomic call locations

When reading a trace, it's not super easy to follow the Atomic operations in the original code... It would help if the filename:line of each operation was available (with Printexc.get_callstack and friends)

Add `Domain.join`

It's a bit of a pain to adapt existing code to dscheck because it doesn't have the standard Domain.spawn type (and Domain.join is missing). It would be super neat if enabling tracing only required this prelude:

open Dscheck

(* or *)
module Atomic = Dscheck.Atomic
module Domain = Dscheck.Domain

(with Domain.join added, I suspect that final wouldn't be required anymore?)

Randomness

Many lock-free implementations use some randomness as an easy way to load balance without coordination. This may be a source of nondeterminism and DSCheck does not like it. For example, say we have a test that begins with two threads wanting to put two elements into random slots in an array.

Run:

Thread A draws slot 3 and puts item there
Thread B draws slot 3 and fails
Thread B retries; draws slot 5 and puts item there
..
..
more steps

If DSCheck finds some races after step 3, it will keep rerunning the sequence 1-3 to explore events of interest that occur further down the sequence. But on the consecutive runs, thread B may succeed on the first try, yielding a different state. In the positive case, B will become disabled before its schedule and DSCheck crashes explicitly. In the more scary one, we may give a positive result without actually having explored all the interesting interleavings.

It's an easy mistake to make and a pretty difficult one to find. Perhaps, DSCheck should choose some random seed, and then reinitialize OCaml's prng to this value at the beginning of every run?

cc @lyrm, who's observed this issue in pratice, with dscheck crashing uncontrollably on ocaml-multicore/saturn#65

Validation for programs weaker than lock-free

Lots of real-world programs don't have lock-free property and it's often optimal that way. For one, because we mostly have a somewhat fair OS scheduler, and it does not make sense to optimize for the theoretical scenario, where one domain runs indefinitely. But, in absence of concrete bounds for unfairness, dscheck needs to explore all the possible Mazurkiewicz traces to prove properties about programs. In simple words, if thread A spins until thread B does something, dscheck will explore thread's A spinning to infinity.

There's a few things we could do to help here:

Add depth limit to the search. The easiest thing to do. It's mentioned in the DPOR paper and was used by CDSCheck guys. The problem is that we don't really know how to determine optimal depth. In the CDSCheck paper, they were looking for bugs in existing implementations to show that the algorithm is useful, in our case, obtaining negative results is just as important.
The papers also mention imposing some fairness conditions on the way DSCheck is scheduling things. I think it's potentially better than depth limit, but still a bit hard to ensure we're actually validating all the required states.
Letting user place yields. This has the disadvantage, that our atomic's interface will depart from stdlib's, so users have to add indirections (rather than just copy in our atomic in tests with dune rules). But it seems to be the best solution from the perspective of correctness, as it's easy for user to identify cycles in the state space. It also seems quite implementable (although we have to be careful when more than 1 thread is yielding at once).

I don't think that we can reasonably identify the cycles automatically, since we cannot know whether e.g. retries of a spinlock are bounded without reading local state.

Documentation requests

I think I'm a bit confused about how to use dscheck. Perhaps other people will be too.

The README says:

At the core, dscheck runs each test numerous times, exhaustively trying every single interleaving of atomic operations.

I tried adding some prints to it:

module Atomic = Dscheck.TracedAtomic
(* tested implementation needs to use dscheck's atomic module *)

let test_counter () =
  print_endline "\nstart";
  let counter = Atomic.make 0 in
  let incr () =
    print_endline "start incr";
    Atomic.set counter (Atomic.get counter + 1);
    print_endline "end incr";
  in 
  Atomic.spawn incr;
  Atomic.spawn incr;
  Atomic.final (fun () ->
      print_endline "end";
      Atomic.check (fun () -> Atomic.get counter == 2)
    )

let () = Atomic.trace test_counter

Many of the runs don't seem to finish. Is this expected? e.g. I get:

start
start incr

start
start incr

start
start incr
end incr

...

It would be useful if it could run without logging while searching, and then run the failing test case again with logging on. The default trace output doesn't seem very useful (e.g. it's hard to know what Process 3: compare_and_swap 14 is referring to).

I had assumed I could use non-atomic variables to keep track of the expected state and then compare that against the atomic values. e.g. adding a value to a lock-free queue and also to a regular Stdlib.Queue and checking the final contents are the same. Is that unsafe?

Does Atomic.spawn always run the function immediately, up to the first atomic operation? For example, this doesn't find the error:

let test_counter () =
  print_endline "\nstart";
  let x = Atomic.make true in   (* This is supposed to be false *)
  let might_be_set = ref false in
  Atomic.spawn (fun () ->
      print_endline "Setting...";
      might_be_set := true;
      Atomic.set x true;
      print_endline "Set";
    );
  Atomic.spawn (fun () ->
      print_endline "Checking";
      if not !might_be_set then assert (Atomic.get x = false)
    );
  Atomic.final (fun () ->
      print_endline "end";
    )

But switching the order of the spawns does find it. I guess if another process depends on the value then I need to use an atomic.

I tried doing this to encourage it to switch:

let yield = Atomic.make () in
...
Atomic.spawn (fun () -> Atomic.get yield; ...)

That worked if I put it in both spawns, but not in just one of them.

Basically, it would be useful to know what things are safe to do and which prevent it from testing all cases.

(Also: the example in the README doesn't do anything unless you add let () = Atomic.trace test_counter. It would be good to include this - maybe tested with MDX?).