Giter VIP home page Giter VIP logo

deptycheck's Introduction

DepTyCheck

Build and test Lint Documentation Status

A library for property-based testing with dependent types and automatic derivation of generators for Idris 2

Status

๐Ÿšง The library is under heavy construction ๐Ÿšง

For now, it lacks most of things required for normal property-based testing, like support for properties and fancy testing operations. Also, for now we do not support such important thing as shrinking.

The current focus for now is on test data generators and automatic derivation of them.

Generators

Generators in the simplest case produce a bunch of values of appropriate type:

genSomeStrings : Gen NonEmpty String
genSomeStrings = elements ["one", "two", "three"]

You can combine several generators into one:

genMoreStrings : Gen NonEmpty String
genMoreStrings = oneOf [genSomeStrings, elements ["more", "even more"]]

Note

All generators listed in oneOf are meant to be distributed uniformly between each other as a whole thing. That is, in genMoreStrings values "one", "two" and "three" have the same probability to be generated and this probability is 2/3 of probability for "more" or "even more" to appear.

So, the following generator is not equivalent to the genMoreStrings from above:

genMoreStrings' : Gen NonEmpty String
genMoreStrings' = elements ["one", "two", "three", "more", "even more"]

In genMoreStrings' all values are distributed uniformly.

To achieve the same result with reusing genSomeStrings generator, we need to dig into it deeper with alternativesOf function:

genMoreStrings'' : Gen NonEmpty String
genMoreStrings'' = oneOf $ alternativesOf genMoreStrings ++ alternativesOf (elements ["more", "even more"])

Note

There are also functions based on the alternativesOf, allowing to map values from all alternatives in a single expression named mapAlternativesOf and mapAlternativesWith.

You can also operate with alternatives as with applicative functors or monads.

For example, consider a generator of lists with length not greater than given. There are a lot of possible implementations, but consider a recursive one:

genListsN : Gen NonEmpty a -> (n : Nat) -> Gen NonEmpty $ List a
genListsN _    Z     = [| [] |]
genListsN genA (S n) = oneOf $ [| [] |]
                            :: [| [genA] :: alternativesOf (genListsN genA n) |]

Distribution of lengths of lists produced by this generator is uniform, thanks to alternativesOf function.

Note

If we were not using alternativesOf at all (say, with expression [| genA :: genListsN genA n |]), probability of getting a list of length n+1 would be 2 times less than getting a list of length n.

Generators can be combined with operations from Applicative interface:

data X = MkX String String

genStrPairs : Gen NonEmpty X
genStrPairs = [| MkX genSomeStrings genMoreStrings |]

Note

The number of alternatives acquired by alternativesOf function of an applicative combination of two generators is a product of numbers of alternatives of those generators.

Unlike, say, in QuickCheck, generators can be empty. This is important for dependent types. For example, Fin 0 is not inhabited, thus we need to have an empty generator if we want to have a generator for any Fin n (and sometimes we really want):

genFin : (n : Nat) -> Gen MaybeEmpty $ Fin n
genFin Z     = empty
genFin (S n) = elements' $ allFins n

As you can see from examples above, there is a special marking of generator's emptiness in the first type argument of the Gen type.

Generators are also monads:

genAnyFin : Gen MaybeEmpty Nat => Gen MaybeEmpty (n ** Fin n)
genAnyFin @{genNat} = do
  n <- genNat
  f <- genFin n
  pure (n ** f)

Distribution of a generator built by monadic bind is such that the whole continuation (i.e. RHS of >>=) applied to some x has the same probability to produce the result as the probability for x to appear.

Thus, in the example above probability of particular natural n to come in the result is the same as probability of this n to come on the given genNat. After that, all fins for each particular n are distributed evenly, because it is a property of genFin.

You need to use alternativesOf if you want all possible resulted values to be distributed evenly, e.g.:

genAnyFin' : Gen MaybeEmpty Nat => Gen MaybeEmpty (n ** Fin n)
genAnyFin' @{genNat} = oneOf $ do
  n <- alternativesOf genNat
  f <- alternativesOf $ genFin n
  pure (n ** f)

Here we are using special monadic syntax support for lists of generators produced by alternativesOf function.

In the last example, all results of genFin 1 and all results of genFin 2 would not be distributed equally in the case when genNat is elements [1, 2], when they are distributed equally in the example of genAnyFin. I.e., particular generator's application genAnyFin @{elements [1, 2]} would be equivalent to oneOf [pure (1**0), elements [(2**0), (2**1)]] where genAnyFin' @{elements [1, 2]} would be equivalent to elements [(1**0), (2**0), (2**1)]. Thus, they have the same domain, but different distributions.

Despite monadic bind of generators interprets left-hand side generators as a whole, it looks inside it when the resulting generator is being asked for alternatives by alternativesOf function or its variants.

For example, alternativesOf being applied to genAnyFin @{elements [1, 2]} would produce two alternatives. Sometimes this can be undesirable, thus, a forgetAlternatives function exists. It allows to forget actual structure of a generator in terms of its alternatives.

Consider one more alternative of genAnyFin, now the given genNat is wrapped with forgetAlternatives:

genAnyFin'' : Gen MaybeEmpty Nat => Gen MaybeEmpty (n ** Fin n)
genAnyFin'' @{genNat} = do
  n <- forgetAlternatives genNat
  f <- genFin n
  pure (n ** f)

In this case, alternativesOf being applied to genAnyFin'' @{elements [1, 2]} would return a single alternative.

Note

Search for alternatives through the series of monadic binds can go to the first generator that is produced with no alternatives.

Consider three generators:

  • do { e1 <- elements [a, b, c]; e2 <- elements [d, e, f]; pure (e1, e2) }
  • do { e1 <- elements [a, b, c]; e2 <- forgetAlternatives $ elements [d, e, f]; pure (e1, e2) }
  • do { e1 <- forgetAlternatives $ elements [a, b, c]; e2 <- elements [d, e, f]; pure (e1, e2) }

The first two generators would have three alternatives each when inspected by alternativesOf, where the third one would have only one.

But if you do alternativesOf for each of generators returned by the first alternativesOf and concatenate the results, the first example would give you nine alternatives, where the second one still would give you three. You can use deepAlternativesOf function with depth argument of 2 for this.

This, actually, violates monadic laws in some sense. Say, alternativesOf can distinct between pure x >>= f and f x if generator f x is, say, of form elements [a, b, c], because in the first case it would produce a single alternative, when in the second there will be three of them. However, according to the generated result these two generators shall be equivalent.

Note

Please notice that deepAlternativesOf can "see" through forgetAlternatives, so count of deep alternatives of depth 2 for the third case would still be 3. If you really need to hide the full structure even from the deepAlternativesOf, you can be much stronger version called forgetStructure:

  • do { e1 <- forgetStructure $ elements [a, b, c]; e2 <- elements [d, e, f]; pure (e1, e2) }

This this case both alternativesOf and even deepAlternativesOf would give you the single alternative. But please keep in mind that monadic composition of a generator with forgotten structure and some significantly empty generator function (like oneOf [g1, forgetStrcuture g2 >>= f]) may have unexpected distribution of values. Probability of values produced by the f related to the probability of values produced by g1 may be divided by the probability of the generator forgetStructure g2 >>= f to produce a non-empty value.

When you are using the weaker forgetAlternatives instead of forgetStructure, distributions are more predictable in case when g2 has some non-trivial structure of alternatives.

Also, here you can see that we can use generators as auto-parameters, thus no need in a separate thing like QuickCheck's Arbitrary.

You can also see that having a dependent type to generate, we can consider different ratio between given and generated type arguments. Technically speaking, Fin n and (n ** Fin n) are different types, but pragmatically, genFin and genAnyFin both generate elements of type Fin n, but the first one takes n as a given value, and the second one generates value for the type argument along with the main value.

You can read more on the design of generators in documentation.

Derivation of generators

DepTyCheck supports automatic derivation of generators using the datatype definition.

For now, it is not tunable at all, however, it is planned to be added.

Derived generators are total. Since for now generators are finite, it was decided to use explicit fuel pattern to support recursive data types. For simplicity, fuel pattern is used for all derived generators.

To invoke derivation, we use deriveGen macro:

genNat : Fuel -> Gen MaybeEmpty Nat
genNat = deriveGen

It uses very powerful metaprogramming facility of Idris 2 programming language for analysing the data structure which generator is derived for and producing code of the asked generator at compile time. This facility is called elaborator reflection, and you can find some kind of tutorial for this here. To enable it, you have to add %language ElabReflection to the source code before the first call to the macro.

However, beware of possible high resources consumption of this stage. In our non-trivial examples, derivation may take several hours and tens of gigabytes of memory. We work for improvements of this, however we hope that new core of the Idris compiler would solve this issue completely.

Of course, we do not support all possible dependent types. For example, for now, we do not support type-polymorphic types and GADTs which have function calls in type indices of their constructors. However, we are constantly working for widening supported types.

Note

For now, derivation is supported only for MaybeEmpty generators.

More on design of derivator can be found in documentation.

Usage and installation

For building and testing we use pack package manager.

Note

Notice, that we've gone mad as far as possible, so even calling for makefile of Sphinx for building documentation is done through pack and prebuild action inside an appropriate ipkg file.

Despite that, to make LSP working well, only a single code-related ipkg file is left in the top-level folder of the project.

The main pack's collection we test the library against is the one mentioned in the last section of the library version. Also, we test against the latest pack collection nightly. Here is the status of the latest such run: Nightly. We try to use as fresh version of the Idris 2 compiler and thirdparty libraries as possible.

deptycheck's People

Contributors

algebraicwolf avatar buzden avatar danmax03 avatar l3odr0id avatar simontsirikov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

deptycheck's Issues

Support structured labels

Currently with the model coverage mechanism we can provide structured resulting information:

Whatever.FieldList.FieldIsTyped covered partially
  - Here: covered
  - There: not covered

Whatever.FieldList.FieldList covered fully
  - Ext: covered
  - Nil: covered

Data.Fin.Fin covered fully
  - FS: covered
  - FZ: covered

To achieve this, we are using labelling mechanism, which now supports only plain Strings as labels.

We could

  • support typed labels, so we can say that this is mentioning of a type or a constructor, or something else; open question: if set of this types should be open or preset;
  • support structured labels, so we can say that e.g. a constructor label is a sublabel of a type label; using this, we could get such structured labelling information automatically; open question: currently, model coverage has a macro that calculates the full set of possible labels beforehand, but just addition of a support for structured labels won't give this ability, so this won't replace the current mechanism of model coverage; maybe, we should support additional labelling mechanism as "potential labels", i.e., a generator says "oh, by the way, I can in theory use the following (structured) labels"; will this impact runtime performance badly?

Incorrect generators are derived for types that have `So` with a function application to a type indexed by another type

Steps to reproduce

Consider the following module that defines a type and asks for a generator:

import Test.DepTyCheck.Gen
import Test.DepTyCheck.Gen.Auto

%default total

%language ElabReflection

%hint
UsedConstructorDerivator : ConstructorDerivator
UsedConstructorDerivator = LeastEffort

data BoolP : Unit -> Type where
  TrueP : BoolP x
  FalseP : BoolP x

f : BoolP x -> Bool
f TrueP = True
f FalseP = False

data X : Type where
  MkX : (u : Unit) -> (x : BoolP u) -> So (f x) -> X

faultyGen : Fuel -> Gen X
faultyGen = deriveGen

Expected behaviour

All derives and typechecks OR a comprehensible error message giving a hint on what is wrong

Observed behaviour

The generator is derived, however it does not compile, instead an error occurs:

Error: While processing right hand side of faultyGen. Error during reflection: While processing right hand side of $resolved11714,<Main.X>[]. While processing right hand side
of $resolved11714,<Main.X>[],<<Main.MkX>>. Undefined name x.

Use `pack` for building instead of custom build system and submodules with dependencies

pack is a modern, cool and convenient way to do what we were doing in an ad-hoc way

May contain as substeps:

  • Usage of pack in CI for selection of particular Idris version instead of custom .idris-version file (#45)
  • Usage of pack for installing external libraries like elab-util and summary-stat in Makefile, thus removing top-level depends dir (#48)
  • Usage of pack for building deptycheck itself (#48)
  • Usage of pack for finding built deptycheck from tests, thus getting rid of ubiquitous depends dirs in tests (#48)
  • Move all installation tests to other sections (#48)
  • Make use of golden-runner-helper to run tests with pack test ... (#49)
  • Remove makefiles (#48, #49)
  • Usage of smaller libraries like if-implicit-unsolved and mtl-tupled-impls instead of actually having their modules here (#50)
  • Moving small common definitions for testing to small libraries with local pack.tomls instead of ubiquitous symlinks to the common files

Generators produce too similar results, i.e. distribution of generated values is poor

Problem

Now the main function of getting particular data from generators of type a gives a lazy list of type a being given some seed. These results are too likely to be very similar and to prefer data constructors declared first.

Suggestion

Suggestion is to not to have LazyList as the main generation data structure; it is suggested to store the structure of generator composition (approximately like LzList does) and to traverse it deeply using random seed state each time a value is requested.

Using this we lose the property of possibly full generation of all values, but practically there are too many of them, thus no need in it, actually.

Also, there will be no need to store the length in LzList-like data structure and in indexing function.

Error when building deptycheck in pack nightly generation

Currently, we don't get new nightly pack collections du to the following error:

2024-05-17T21:12:06.3447819Z [ build ] Error: While processing right hand side of canonicBody. Can't find an implementation for Elaboration ?m.
2024-05-17T21:12:06.3448431Z [ build ] 
2024-05-17T21:12:06.3448924Z [ build ] Deriving.DepTyCheck.Gen.Core:45:41--45:112
2024-05-17T21:12:06.3449418Z [ build ]  41 | ConstructorDerivator => DerivatorCore where
2024-05-17T21:12:06.3449859Z [ build ]  42 |   canonicBody sig n = do
2024-05-17T21:12:06.3450181Z [ build ]  43 | 
2024-05-17T21:12:06.3450572Z [ build ]  44 |     -- check that there is at least one constructor
2024-05-17T21:12:06.3451541Z [ build ]  45 |     when .| null sig.targetType.cons .| fail "No constructors found for the type `\{show sig.targetType.name}`"
2024-05-17T21:12:06.3452303Z [ build ]                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Manual fine-tuning of generator ordering

Ordering of subgenerators seems to have a great effect on the performance of generators. Part of these concerns are going to be addressed by applicative composition of generators described in #81. This would eliminate the need to choose orderings of independent generators, reducing the space of possible orderings.

Sometimes, when deriving a generator, a value that occurs in indices of several arguments of a data constructor must be produced. In that case, there is a choice -- we need to pick an argument that will be generated together with a value, and the rest of the arguments will get that value as a given index. This choice could greatly influence the performance of resulting generator.

Problematic case

To make an illustrated example, let's consider a simple language. It shall consist of a sequence of assigning arithmetic expressions to variables. Already declared variables can be used in following expressions. A simple program of this language would look like this:

x2 = 2 * 5 + 7
x1 = x2 * 4
x3 = x1 - 3

The language specification will follow the following scheme (full implementation is here):

data Expression : Context -> Type where
  ...

data Lang : Context -> Type where
  Empty : Lang []
  Assign : (nm : Name) -> (expr : Expression ctx) -> (rest : Lang ctx) -> Lang (nm::ctx)

Then, if we want to produce the generator for Lang[], we can choose one of two orderings for data constructor Assign:

  1. Expression[] and Lang[0] -- the one currently chosen by DepTyCheck,
  2. Expression[0] and Lang[]

As it turns out, the second ordering performs significantly better. The number of constructors used in expressions depends linearly on the average time to generate a sample:

image

The number of variables introduced shows dependency on time akin to logarithmic:

image

Generation time increases exponentially w.r.t. the fuel provided:

image

I was not able to build similar plots for either derived or handwritten generator that would be structured like a derived one due to spurious slow downs in generation for fuel greater than 2: the generator would freeze after several examples for unreasonably long periods of time.

Possible solutions

Because of the dramatic difference in performance, it would be beneficial to give user control over the orderings of generators. One solution that I have in mind is to supply user with special type wrappers:

public export
Given : List String -> Type -> Type
Given _ ty = ty

public export
Generated : List String -> Type -> Type
Generated _ ty = ty

Using them, it is possible to manually choose arguments for which indices will be derived or generated:

data Lang : Context -> Type where
  Empty : Lang []
  Assign : (nm: Name) -> (expr : Given ["ctx"] (Expression ctx)) -> (rest : Generated ["ctx"] (Lang ctx)) -> Lang (nm::ctx)

Or, alternatively

public export
Given : a -> a
Given x = x

public export
Generated : a -> a
Generated x = x

That will be used like this:

data Lang : Context -> Type where
  Empty : Lang []
  Assign : (nm: Name) -> (expr : Expression (Given ctx)) -> (rest : Lang (Generated ctx)) -> Lang (nm::ctx)

Whichever one would be easier to work with.
These wrappers should be treated specially during derivation before normalising the constructor types. It should be optional for the user to provide the wrappers, and we must not require annotating all arguments -- derivation engine should start with what the user had specified and work out the rest.

Of course, there are some caveats associated with this approach:

  • User-defined annotations might turn out to be nonsensical -- one might choose to annotate all occurrences of an index with Given modifier, or make several Generated occurrences. We need to choose how to handle such scenarios.
  • User might want to derive several generators with different signatures. In that case, the appropriate set of modifiers is likely to be different. Of course, this could be solved by creating several data types, and writing trivial conversion functions between them. However, this would lead to a great amount of boilerplate code and repetition.

Bad error message when configuration is insufficient

Steps to reproduce

Consider a very simple module which asks for derivation of a generator:

import Test.DepTyCheck.Gen.Auto

%language ElabReflection

g : Fuel -> Gen Unit
g = deriveGen

Expected behaviour

All derives and typechecks OR a comprehensible error message giving a hint on what is wrong.

Observed behaviour

Error: While processing right hand side of g. Bad elaborator script (canonicBody (MkGenSignature (MkTypeInfo (NS (MkNS ["Builtin"]) (UN (Basic "Unit"))) [] [MkCon (NS (
MkNS ["Builtin"]) (UN (Basic "MkUnit"))) [] (IVar EmptyFC (NS (MkNS ["Builtin"]) (UN (Basic "Unit"))))]) (keySet (M Empty))) (nameForGen (MkGenSignature (MkTypeInfo (NS
 (MkNS ["Builtin"]) (UN (Basic "Unit"))) [] [MkCon (NS (MkNS ["Builtin"]) (UN (Basic "MkUnit"))) [] (IVar EmptyFC (NS (MkNS ["Builtin"]) (UN (Basic "Unit"))))]) (keySet
 (M Empty))))) .unRWST (fromList ((fromList ((MkGenExternals []) .externals <&> (\lamc => let (sig, _) = lamc in (sig, nameForGen sig)))) .asList <&> (\namedSig => (fst
 (internalise (fst namedSig)), namedSig)))) (M (M 0 (Leaf (MkGenSignature (MkTypeInfo (NS (MkNS ["Builtin"]) (UN (Basic "Unit"))) [] [MkCon (NS (MkNS ["Builtin"]) (UN (
Basic "MkUnit"))) [] (IVar EmptyFC (NS (MkNS ["Builtin"]) (UN (Basic "Unit"))))]) (SetWrapper (M Empty))) (UN (Basic "<Builtin.Unit>[]"))))) ([], []) (script is not a d
ata value).

X:8:5--8:14
 4 | 
 5 | %language ElabReflection
 6 | 
 7 | g : Fuel -> Gen Unit
 8 | g = deriveGen
         ^^^^^^^^^

Derivation fails when record update is used as a data index

Consider the following example:

record X where
  constructor MkX
  field : Nat

data Y : X -> Type where
  Start : Y r
  Next : Y ({field $= S} r) -> Y r

genY : Fuel -> (x : X) -> Gen MaybeEmpty $ Y x
genY = deriveGen

I'd expect this derivation to succeed, but it fails with

While processing right hand side of genY.
Error during reflection:
While processing right hand side of $resolved13352,<Deriving.DepTyCheck.Gen.Y>[0].
While processing right hand side of $resolved13352,<Deriving.DepTyCheck.Gen.Y>[0],<<Deriving.DepTyCheck.Gen.Next>>.
Deriving.DepTyCheck.Gen.case block in $resolved13323 is not accessible in this context.

I suspect coding of field update syntax through some case block does this, maybe we need to workaround it somehow on the library side.

Use layered (staged) applicative composition at derivation to improve generation

Monadic composition may be ineffective or make distribution bad is there a raw generator lies around. Thus, we can use applicative composition as much as we can in order to generate first values that we can.

Thus, at each moment inside generation we can see which values can be generated (i.e., which values do not depend on any other value which needs to be generated) and generate them in a single pass by composing appropriate generators applicatively. Then we can have the second round, because at this stage of generation some of values can be considered independent, because all values they depend on, are already present. And so on.

This should, by the way, reduce count of possible "orderings" (when calling for a particular data type constructor) which we suffer from currently.

My examples from thinking about this in May '23:

import Test.DepTyCheck.Gen

data X : Type

data Y : Type

data F : Type

fun : Y -> F

data Z : Type

data B : X -> F -> Type

data C : X -> Type

data D : Y -> Type

data D' : Y -> Type

data E : X -> Z -> Type

data T : Type where
  MkA : (x : X) -> (y : Y) -> (z : Z) -> B x (fun y) -> C x -> D y -> D' y -> E x z -> T

genB_f : (f : F) -> Gen (x ** B x f)
genB_xf : (x : X) -> (f : F) -> Gen (B x f)

genC : Gen (x ** C x)
genC_x : (x : _) -> Gen (C x)

genD : Gen (y ** D y)
genD_y : (y : _) -> Gen (D y)

genD' : Gen (y ** D' y)
genD'_y : (y : _) -> Gen (D' y)

genE : Gen (x ** z ** E x z)
genE_x : (x : _) -> Gen (z ** E x z)

genT : Gen T
genT = oneOf

  [ -- (E + D) * (D' + B + C)
    do ((x ** z ** e), (y ** d)) <- (,) <$> genE <*> genD
       (d', b, c) <- (,,) <$> genD'_y y <*> genB_xf x (fun y) <*> genC_x x
       pure $ MkA x y z b c d d' e

  , -- D * (D' + B * (C + E))
    do (y ** d) <- genD
       let l = genD'_y y
       let r : Gen (x ** (B x (fun y), C x, (z ** E x z))) := do
         (x ** b) <- genB_f (fun y)
         (c, (z ** e)) <- (,) <$> genC_x x <*> genE_x x
         pure (x ** (b, c, (z ** e)))
       (d', (x ** (b, c, (z ** e)))) <- (,) <$> l <*> r
       pure $ MkA x y z b c d d' e

  , -- (C + D) * (B + D' + E)
    do ((x ** c), (y ** d)) <- (,) <$> genC <*> genD
       (b, d', (z ** e)) <- (,,) <$> genB_xf x (fun y) <*> genD'_y y <*> genE_x x
       pure $ MkA x y z b c d d' e

  -- ... and other orders, say, D * (D' + C * (B + E)), (E + D') * (D + B + C)
  ]

This suspiciously resembles Arrows composition, thus we may think (again) on 1) automation of arrows computation 2) representing generators as arrows.

Support derivation of `NonEmpty`-generators, in case the type is indeed non-empty

In fact, this can be done in two ways:

  • one is without any change in the derived code; one would just fail to typecheck against type signature with NonEmpty in case when actually there is a possibility of emptiness due to actual property of a type;
  • another is to analyse deeply during derivation a possibility of emptiness; may be hard to track, since indeed non-empty generator may use maybe-empty generator as a subgenerator in case it's baked by some nice alternative.

Say, it should be okay to have

data X : Nat -> Type where
  JustAValue : X n
  WithFin : Fin n -> X n

genX : Fuel -> (n : Nat) -> Gen NonEmpty $ X n
genX = deriveGen

-- Just to remind: `Fin 0` is an empty type

Consider simple generators for decidable types

Maybe during decision of calling subgenerator, before derivation we should check for appropriate Dec instance and use it instead of a derived generator. Dec instances likely to be present for "semantically contractible" types, so it's enough to use those returned in Dec's Yes, if it's present. Surely, if Dec returns No, there's no need to derive a generator (for an independent type) or to return an empty generator (for a dependent type, whose Dec instance contains additional arguments).

Lengths of generated lists of derived sorted lists are too short

In the sorted-list example, there is a data type for sorted list, which has a derived generator. For some reason, this derived generator always produces lists of length ~2-3, sometimes 4, not more, even of fuel is over say, 100. Influence of the fuel to the values inside lists are visible clearly, but somewhy this does not affect the length of the generated lists.

Pre-calculate and filter out combinations of unfortunate orderings, especially given indices

In the case when we have a data type with a constructor with (at least) two arguments that depend on one, say

data X : Type where
  MkX : Y n -> Z n -> X

we treat both types Y and Z equally, and try both

  do ...
     (n ** y) <- genY
     z <- genZn n
     pure $ MkX {n} y z

and

  do ...
     (n ** z) <- genZ
     y <- genYn n
     pure $ MkX {n} y z

in usual LeastEffort, and arbitrary one of those two when simplificationHack option it turned on.

But in case when, say, Y has n as index, and Z has n as a parameter, it's much better to prefer first generation of Y and only after Z with the given n. The same is applicable when, say, Y has only constructors and free variables in its indices, when Z has both constructors, free variables and calls non-reversible functions -- in this case, we should first generate value for Z along with some n, and have given arguments for Y -- because we can't generate a generator for Z with the given n because of those non-reversible functions.

To do this, we could filter out unfortunate orders in the LeastEffort. First we should try to filter out those orders, when any given parameter is actually an index. If we get an empty list of orders, then we should start from the begining and filter out only those, which contains calls to non-reversible functions in indices. And only after that, we should apply the simplification hack, if this option is turned on.

This both would increase generation time, because of reduced runtime filtering, and in some cases it would allow some functions in indices where in other cases we how get an error of unsupported data type.

I wonder if this is actually always fixed by solution of #55 or not.

Small change to the data type leads to great increase of resources consumption

Derivation for the following data type using LeastEffort derivation algorithm completes pretty fast:

data X : Type where
  MkX : (n, k : Nat) -> Fin (n + k) -> X

checkedGen : Fuel -> Gen X
checkedGen = deriveGen @{MainCoreDerivator @{LeastEffort}}

It takes ~5.3 Gb of memory and ~5.5 min of time.

Steps to reproduce

Now, add some more (dependent) parameter to the constructor:

data X : Type where
  MkX : (n, k : Nat) -> (f : Fin $ n + k) -> So (finToNat f < n + k) -> X

Expected behaviour

Derives with similar resources consumption

Observed behaviour

Takes ~28 Gb of memory and ~1/2 hour (on the same machine) to typecheck

Consider given arguments and/or actual type arguments in the model coverage

Currently labelling for the model coverage is automated by derivation mechanism for derived generators and by withCoverage macro for hand-written generators.

Currently if some type is used only as a given parameter (this happens for topmost generator and for arguments that are set by GADT and never claimed otherwise, say, never stored in any constructor of the type itself), it will be present in the model coverage report, but never will be marked as covered, even if it is actually used with an appropriate matter.

This can be done by calling a-la withCoverage for each of the given argument of the generator function. Also, we may need to consider left parts of the dependent pair of the returned value. Perhaps, the whole expressions of the type arguments should be considered too.

There will be some complications with 0-arguments, as soon as we will support Exists-like dependent pairs as the generator's return type. I think, we will need to change the algorithm of potential model coverage collection to consider that some types may not be present in runtime, and thus are allowed to be not covered (unless, a runtime generator for this value is present). Surely, we need to be aware that the same type can be present both in an erased and runtime position, thus we need to take the runtime-most one state in a potential model coverage.

Derived generators have incorrect order of `decEq` applications

Steps to reproduce

Consider the following module that asks for derivation of a generator:

import Test.DepTyCheck.Gen
import Test.DepTyCheck.Gen.Auto

%default total

%language ElabReflection

%hint
UsedConstructorDerivator : ConstructorDerivator
UsedConstructorDerivator = LeastEffort

data X : Type 

data Y : X -> Type where
  MkY : Y x

data X : Type where
  Nil : X
  Cons : (x : X) -> Y x -> X

data Z : X -> X -> Type where
  MkZ : (prf : Y x) -> Z (Cons x prf) (Cons x prf)

faultyGen : Fuel -> (x : X) -> (x' : X) -> Gen $ Z x x'
faultyGen = deriveGen

Expected behaviour

All derives and typechecks OR a comprehensible error message giving a hint on what is wrong

Observed behaviour

Error: While processing right hand side of faultyGen. Error during reflection: While processing right hand side
of $resolved11708,<Main.Z>[0, 1]. While processing right hand side of $resolved11708,<Main.Z>[0, 1],<<Main.MkZ>>. When unifying:
    Y to_be_deceqed^^x0
and:
    Y x
Mismatch between: to_be_deceqed^^x0 and x.

Do not spend fuel when given index is structurally decreasing

Currently whenever derived generator recursively calls itself with a structurally smaller given parameter, the fuel is spent nonetheless.
Consider the following example:

data NatVect : Nat -> Type where
  Nil : NatVect 0
  (::) : Nat -> NatVect n -> NatVect (S n)

genNatVect : (Fuel -> Gen Nat) => Fuel -> (n : Nat) -> Gen $  NatVect n
genNatVect = deriveGen

In that case, if desired length is greater than fuel, no output would be produced, because the generator would run out of fuel before producing anything. Decrease of fuel, however, is not required here -- generator calls itself with decreasing given parameter, hence ensuring termination.

Perhaps it would be better to avoid spending fuel in such cases?

Unification error (same name should be used) in derived generator

Steps to reproduce

Consider the following data type:

data X : (n : Nat) -> Fin n -> Type where
  MkX : (n : _) -> (f : _) -> X (S n) (FS f)

Derive generator with both type argument given (e.g. using LeastEffort derivation algorithm):

checkedGen : Fuel -> (n : Nat) -> (f : Fin n) -> Gen $ X n f
checkedGen = deriveGen

Expected behaviour

Derives and typechecks well.

Observed behaviour

Error during reflection: While processing right hand side of {u:3614},<DerivedGen.X>[0, 1]. While processing left hand side of {sa:3639},<<DerivedGen.MkX>>. When unifying:
    S ?to_be_deceqed^^n0
and:
    S ?n
Pattern variable n unifies with: ?to_be_deceqed^^n0 [locals in scope: ^outmost-fuel^, n, f, {arg:3490}, ^fuel_arg^].

Suggestion: Use the same name for both pattern variables, since they unify.

Print percent of *tests* where element of a model coverage is involved

Currently, after the #94, count of presence in a model coverage is printed along with coverage status.

Now it looks like this (takes from test here):

PrintCoverage.Y covered partially (100 times)
  - Y1: covered (41 times)
  - Y2: not covered
  - Y3: covered (59 times)

There were 100 tests generated, Y was used in each test once, thus the whole count of usages of the type Y is 100% for the tests, where individual actually generated constructors have 41% and 59% respectively.

But when type is recursive, or just is used several times, one can get percentage >100% (taken from another test here):

PrintCoverage.X covered partially (118 times)
  - X1: covered (54 times)
  - X2: not covered
  - X3: covered (64 times)

PrintCoverage.Y covered partially (100 times)
  - Y1: covered (41 times)
  - Y2: not covered
  - Y3: covered (59 times)

Actually, type X is present in 59% of the tests, along with the Y3 constructor, but each time it is present twice. If we were printing this with a percentage, 118% near to the X type would be a misleading number, where "present in 59% of tests" could be feasible one.

So, it is suggested to add percentage of presence in the tests, but we must be accurate and (unlike the current times coverage) to not to count each model coverage element several times in a single test.

The current design does not support this directly, we don't store this type of data. There are at least two possible solutions to this:

  • change the order of ModelCoverage collection so that each test uses registerCoverage independently, and thus is can be changed to calculate percentage of mentions of types and constructors per test; I don't like this solution because of potential high count of calls to relatively expensive registerCoverage function;

  • to have two separate Semigroups implemented for the ModelCoverage type: one is for use with the Writer inside appropriate unGen which does not accumulate the usage-per-test metric, and is not exported outside the module; and the other which does accumulate usage-per-test metric and is exported outside; the latter one should be declared in the bottom of the module to not be used accidentally in the Writer context.

Migration to the modern `elab-util` version

Currently an old pre-overhaul version of the elab-util library is used. I tried to migrate in a single hop a lot of times, but didn't even manage to finish. Currently, I see the migration to be possible in the following steps:

  • Copying old pre-overhaul parts of Language.Reflection.Types into DepTyCheck (with permission of Stefan) and switching to them (#137);
  • Removing special fork from pack.toml (#137);
  • Using modern pretty-printer from elab-pretty library (#137);
  • Switching from Arg True to Arg False everywhere possible to smoothen transition to modern unparametesired Arg (#138);
  • Switching from outdated Arg True (aka NamedArg) type to moder Arg type and NamedArg predicate (#138);
  • Moving to the modern indexed Con and TypeInfo.

Bad file position in an error message when something wrong with an argument of a function type with a named argument

Derived generators currently cannot have unnamed explicit arguments, e.g. this is wrong and should fail:

gen : Fuel -> ((a : Nat) -> Nat) -> Gen MaybeEmpty Nat
gen = deriveGen

Also, there cannot be any unused explicit arguments, e.g. this is wrong and should fail:

gen' : Fuel -> (d : (a : Nat) -> Nat) -> Gen MaybeEmpty Nat
gen' = deriveGen

And both these derivation indeed fail, as should. But their error messages are a bit misleading:

Error: While processing right hand side of gen. Error during reflection:
Explicit argument must be named and must not shadow any other name

<filename>:12:17--12:18
 11 |
 12 | gen : Fuel -> ((a : Nat) -> Nat) -> Gen MaybeEmpty Nat
                      ^
Error: While processing right hand side of gen'. Error during reflection:
Given parameter is not used in the target type

<filename>:15:22--15:23
 14 |
 15 | gen' : Fuel -> (d : (a : Nat) -> Nat) -> Gen MaybeEmpty Nat
                           ^

They should highlight the whole type, or at least argument's name (if present) instead, something like this:

Error: While processing right hand side of gen. Error during reflection:
Explicit argument must be named and must not shadow any other name

<filename>:12:17--12:18
 11 |
 12 | gen : Fuel -> ((a : Nat) -> Nat) -> Gen MaybeEmpty Nat
                     ^^^^^^^^^^^^^^^^

and one of these:

Error: While processing right hand side of gen'. Error during reflection:
Given parameter is not used in the target type

<filename>:15:22--15:23
 14 |
 15 | gen' : Fuel -> (d : (a : Nat) -> Nat) -> Gen MaybeEmpty Nat
                      ^

or

Error: While processing right hand side of gen'. Error during reflection:
Given parameter is not used in the target type

<filename>:15:22--15:23
 14 |
 15 | gen' : Fuel -> (d : (a : Nat) -> Nat) -> Gen MaybeEmpty Nat
                          ^^^^^^^^^^^^^^^^

or

Error: While processing right hand side of gen'. Error during reflection:
Given parameter is not used in the target type

<filename>:15:22--15:23
 14 |
 15 | gen' : Fuel -> (d : (a : Nat) -> Nat) -> Gen MaybeEmpty Nat
                      ^^^^^^^^^^^^^^^^^^^^

Get rid of need for explicit ubiquitous fuel pattern

Current generators of type Gen are explicitly finite. Thus, when we define recursive generators, we use fuel pattern everywhere, especially in derived generators. Actually, for simplicity, we use it always for derived generators, even when it is actually not needed.

By the way, using Fuel -> Gen ... instead of just Gen ..., we run into problems in the compiler, e.g. idris-lang/Idris2#2522

We could have a possibly-infinite generators using co-data inside and to provide fuel only at the end of the day, when producing particular values.

We need to decide whether this is useful to have a distinction between definitely finite and possibly infinite generators, or not. This distinction makes it easier to run finite generators (no fuel required) but makes generators combinations and derivation more complex.

Fill in tutorials in the docs

What should be in the tutorial is written here.

At least these are planned:

  • "Test your first function"
  • Dedicated hand-written generators
  • Dedicated derivation

Support using non-`MaybeEmpty` generators when deriving even `MaybeEmpty` ones

Currently if you derive MaybeEmpty generator, and want to pass an external one, it also should be MaybeEmpty, i.e., say, when you have

record X where
  constructor MkX
  one : Nat
  another : Nat

now you cannot derive the following generator:

genX : Fuel -> (Fuel -> Gen NonEmpty Nat) -> Gen MaybeEmpty X
genX = deriveGen

Once #176 is solved, this would be needed for NonEmpty generators too.

Manage `(l = r) -> Void` fields in constructors specially

When some (l = r)- > Void (i.e. Not (l = r)) instances need to be generated for a constructor, we could either derive a special generator for this type like

genNotX : (l, r : X) -> Gen MaybeEmpty $ (l = r) -> Void
genNotX l r = case decEq l r of
  Yes _ => empty
  No co => pure co

or just use this type of expression instead of a subgenerator call (the first seems to be more preferable, since it should work for any constructor generation algorithm).

Generator above works well only when both arguments are given. Maybe, we should always make generation order in a such way that these generators only "filter", i.e. they always take both their generators as givens.


However, it is possible to derive such generators with one derived and one generated argument. It should be possible at least when DecEq is derivable; however I'm not sure it's feasible in a general case.


Anyway, since generators with both given arguments seems to be not really sparse, so, maybe this hard derivation is not really needed. this needs some investigation.

Automatic data types fusion when deriving generator for constructor

This can be seen as generation-time optimisation by growing the size of derived code. @AlgebraicWolf tested fusion in an ad-hoc example and it gave better results. We can generalise this.

The idea is to generate a data type for each data constructor (which is of a (complex?) dependent type) which would contain all constraints of data types of all constructor parameters. When we generate arguments for this particular constructor, instead of a sequence of subgeneration calls and then assembling it to the constructor's call, we can simply generate that specialised data structure and use it to generate the constructor call.


For example, consider couple of GADTs:

data X : Nat -> Type where
  X1 : Double -> X1
  Xn : (n : Nat) -> X (S n)

data Y : Nat -> Type where
  Y0 : String -> Y 0
  Y1 : Char -> Y 1
  Yn : Y n

If we derive a generator for data type Z defined like

data Z : Type where
  Zx : Z
  Zn : (n : Nat) -> X n -> Y n -> Z

we now make generator for Z to be alternatives of generator for Zx and generator for Zn, where generator of Zn is an alternative between approximately

do (n ** xn) <- genAnyX
   yn <- genY n
   pure $ Zn n xn yn

and

do (n ** yn) <- genAnyY
   xn <- genX n
   pure $ Zn n xn yn

What is suggested here is to create a special data type

data ForZn : Type where
  ForZn_X1_Y1 : Double -> Char -> ForZn
  ForZn_X1_Yn : Double -> ForZn
  ForZn_Xn_Y1 : Char -> ForZn
  ForZn_Xn_Yn : (n : Nat) -> ForZn

which contains all possible data from all possible arguments of the constructor Zn in a fused way. So, there exists a function
zn : ForZn -> Z which produces all possible calls to constructor Zn:

zn : ForZn -> Z
zn $ ForZn_X1_Y1 d k = Zn $ 1 (X1 d) (Y1 k)
zn $ ForZn_X1_Yn d   = Zn $ 1 (X1 d) Yn
zn $ ForZn_Xn_Y1 c   = Zn $ 1 (Xn 0) (Y1 k)
zn $ ForZn_Xn_Yn n   = Zn $ (S n) (Xn n) Yn

So, generator for Zn can simply look like zn <$> genForZn, and generator for ForZn can be simply derived as usual (it does not contain dependent types in its constructors, so this mechanism should not be applied to it recursively).

Derived generator sometimes contain an unreachable clause

Steps to reproduce

TBD (All current examples I have are extremely large)

Expected behaviour

Generator derives with no error or warning

Observed behaviour

Warning: Unreachable clause: (<<P.DCF>>) (^outmost-fuel^) external^<P.I>[] external^<^prim^.Bits32>[] external^<^prim^.Bits64>[] external^<Data.Fin.Fin>[0] external^<P.PT>[] external^<P.C>[] external^<P.AT>[] external^<P.MAT>[] external^<P.VMAT>[0] accTy rs retTy {arg:3541} {arg:3538} (^fuel_arg^) _ _ _

Implement `TTImp` traversals in `Util.Reflection` through `mapATTImp`

allVarNames and hasNameInsideDeep do typical traversals of TTImp, with some gluing function (++ and || respectively) and separate tratment of particular TTImp constructors (IVar, in particular). This can be done by inventing a special applicative type, which has a phantom type argument and contains the result of the required traversal (List Name and m Bool respectively). This special applicative type can excersise the gluing function in its implementation of <*>.

The only concern I have to that is that I'm not sure that this approach retains laziness of LazyList-resulting traversals.

Derive proof of completeness for derived generators

Currently, there is a strong reasons to believe that derived generators are (intensionally) complete - in our case, this means that in the tree representing a derived generator for certain type (for each value of this type) there exists a leaf node that contains this value. However, there is no proof that this completeness holds. For correctness reasons, it might be useful to derive a proof of intensional completeness along with the generator.

Move data types and utilities to separate reusable packages

Blocked by #27. This is rather easy to do as soon as we switch nicely to pack as a build system.

DepTyCheck package should be split to:

  • Values generation facility
  • Property-based testing part (not implemented yet)
  • Automatic derivation of generators

Parts that can be moved out:

  • Unsolved implicit defaulting
  • Tupled instances for Control.Monad.* (+ implementation of additional instances for reader and writer; and all for RWST);
  • Dependent vector
  • Nat1
  • List with controlled emptiness (CEList)
  • System.Random.Pure
  • Reflection-related utilities
  • Syntax utilities of style "I hate parens"
  • Collection utilities
  • Utilities for Fins
  • Utilities for Alternatives
  • Stuff for statistics-related calculations

Derived generator fails to compile with a "Non linear pattern variable" error

Steps to reproduce

Consider the following module:

import Test.DepTyCheck.Gen.Auto

%default total

%language ElabReflection

%hint
UsedConstructorDerivator : ConstructorDerivator
UsedConstructorDerivator = LeastEffort

data Y : Unit -> Unit -> Type where
 MkY : Y x y

data X : Type where
  MkX : Y x x -> X

faultyGen : Fuel -> Gen X
faultyGen = deriveGen

Expected behaviour

All derives and typechecks OR a comprehensible error message giving a hint on what is wrong.

Observed behaviour

Error: While processing right hand side of faultyGen. Error during reflection: While processing right hand side of $resolved11875,<Main.X>[]. While processing right hand side of $resolved11875,<Main.X>[],<<Main.MkX>>. Can't match on ?x [no locals in scope] (Non linear pattern variable).

Derivation fails when constructor in dependent type index has an unnamed auto implicit argument

Steps to reproduce

Consider the following module which asks for derivation of a generator:

import Test.DepTyCheck.Gen
import Test.DepTyCheck.Gen.Auto

%default total

%language ElabReflection

%hint
UsedConstructorDerivator : ConstructorDerivator
UsedConstructorDerivator = LeastEffort

mutual
  data X : Type where
    Nil : X
    (::) : (x : Unit) ->
           (xs : X) ->
           Y x xs =>
           X

  data Y: Unit -> X -> Type where
    A : Y n []
    B : Y n xs ->
        Y x xs =>
        Y n (x::xs)


faultyGen : Fuel -> Gen X 
faultyGen = deriveGen

Expected behaviour

All derives and typechecks OR a comprehensible error message giving a hint on what is wrong.

Observed behaviour

Error: While processing right hand side of faultyGen. Error during reflection: Unsupported name `{conArg:933}` of a parameter used in the constructor

FaultyGen:31:9--31:20
 27 |   data Y: Unit -> X -> Type where
 28 |     A : Y n []
 29 |     B : Y n xs ->
 30 |         Y x xs =>
 31 |         Y n (x::xs)
              ^^^^^^^^^^^

Support paralellised running of generators

Currently, unGen is very linear, it runs one stuff right after the other. We could support running the generator by translating it to a special thing that could be run in parallel. E.g., imagine that Coop supports actually parallelised run. Then we could have unGenCoop : Gen1 a -> Coop m a, which actually runs all, say, applicatives in parallel.

We could imagine, possibly, a computation backed by coop (once it has a parallel backend) or async.

To support all of that, we would need to constantly split our random numbers generators, e.g. like it's done in falsify library, to make halves of, say, applicatives be able to run indenendently. This seems to be feasible anyway, and even may lead to an ability to shrink nicely, like it's done in falsify.

Strange error when using direct `DPair` instead of `(... ** ...)` syntax of a derived generator return type

I had a definitions and a derived generator

data NameIsNew : (funs : Funs) -> (vars : Vars) -> UniqNames funs vars -> String -> Type

genNewName : Fuel -> (Fuel -> Gen MaybeEmpty String) =>
             (funs : Funs) -> (vars : Vars) -> (names : UniqNames funs vars) ->
             Gen MaybeEmpty (s ** NameIsNew funs vars names s)

and all worked, but once I change the return type to be in form of

             Gen MaybeEmpty $ DPair String $ NameIsNew funs vars names

derivation suddenly crashes with

Error: While processing right hand side of genNewName. Error during reflection: Target type's argument must be a variable name

with non-var expression being String.

Suddenly, replacing the return type with

             Gen MaybeEmpty $ DPair String $ \x => NameIsNew funs vars names x

helps with this.

Unification error after matching in `Yes Refl` inside derived generators

Steps to reproduce

Consider the following module which defines two types and asks for a generator of one of them:

import Test.DepTyCheck.Gen
import Test.DepTyCheck.Gen.Auto

%default total
%language ElabReflection

%hint
UsedConstructorDerivator : ConstructorDerivator
UsedConstructorDerivator = LeastEffort

data X : Type where
  Nil : X
  (::) : Unit -> X -> X

DecEq X where
  [] `decEq` [] = Yes Refl
  [] `decEq` (y :: ys) = No $ \case Refl impossible
  (x :: xs) `decEq` [] = No $ \case Refl impossible
  (() :: xs) `decEq` (() :: ys) = case xs `decEq` ys of
                                    (Yes prf) => rewrite prf in Yes Refl
                                    (No contra) => No $ \case Refl => contra Refl

data Y : (xs : X) -> (ys : X) -> Type where
  A : Y (x :: xs) (x :: xs)
  B : Y xs ys -> Y (x :: xs) (y :: ys)

faultyGen : Fuel -> (xs : X) -> (ys : X) -> Gen $ Y xs ys
faultyGen = deriveGen

Expected behaviour

All derives and typechecks OR a comprehensible error message giving a hint on what is wrong

Observed behaviour

The generator is derived, however it does not compile, instead an error occurs:

Error: While processing right hand side of genY. Error during reflection: While processing right hand side of $resolved11735,<Main.Y>[0, 1]. While processing right hand side
of $resolved11735,<Main.Y>[0, 1],<<Main.A>>. When unifying:
    Y (x :: to_be_deceqed^^xs1) (x :: to_be_deceqed^^xs1)
and:
    Y (to_be_deceqed^^x0 :: to_be_deceqed^^xs1) (to_be_deceqed^^x0 :: to_be_deceqed^^xs1)
Mismatch between: x and to_be_deceqed^^x0.

Possible explanation

If we examine the code of a derived generator, the function of roughly the following structure is produced:

faultyGen : Fuel -> (xs : X) -> (ys : X) -> Gen $ Y xs ys
faultyGen fuel_arg xs ys =
  let genA : Fuel -> (xs : X) -> (ys : X) -> Gen $ Y xs ys
       genB : Fuel -> (xs : X) -> (ys : X) -> Gen $ Y xs ys

       genA cons_fuel (x :: xs) (to_be_deceqed_x0 :: to_be_deceqed_xs1) =
         case decEq x to_be_deceqed_x0 of
                 (No _) => empty
                 (Yes Refl) => case decEq xs to_be_deceqed_xs1 of
                                              (No _) => empty
                                              (Yes Refl) => ?something_interesting
      genA _ _ _ = empty

      genB cons_fuel (x :: xs) (y :: ys) = ?genB_rhs
      genB _ _ _ = empty

      in ?rest

where the context of a hole ?something_interesting is

   ys : X
   fuel_arg : Fuel
   to_be_deceqed_xs1 : X
   to_be_deceqed_x0 : ()
   cons_fuel : Fuel
   x : ()
   xs : X
------------------------------
something_interesting : Gen (Y (to_be_deceqed_x0 :: to_be_deceqed_xs1) (to_be_deceqed_x0 :: to_be_deceqed_xs1))

However, derivator fills the hole with expression pure {f = Test.DepTyCheck.Gen.Gen} (A {xs = xs} {x = x}), thus producing the error observed.

Treat `So (f x y z)` and `f x y z = True` arguments of constructors specially during derivation

If constructor's argument is of form So $ f x y z or f x y z = True or True = f x y z (for whatever function f and whatever count of its arguments), and if we can inspect this f, we can optimise generation to be more constructive, instead of filtering, during generation.

What we can do is to generate a special data type definition F so that if f : X -> Y -> Z -> Bool, F would be data F : X -> Y -> Z -> Type so that both

equivDir : (x : X) -> (y : Y) -> (z : Z) -> f x y z = True -> F x y z
equivRev : (x : X) -> (y : Y) -> (z : Z) -> F x y z -> f x y z = True

hold.

This means that we can replace So $ f x y z and/or f x y z = True argument by F x y z without any change in the meaning.

Thus, we can replace isJust m = True with IsJust m, and elem x xs with Elem x xs. Generators derived with such a trick would be much more performant, since they would not compute direct function, instead they would constructively generate appropriate values.


Beware of thoughts of general replacement of, say, functions like elem x xs to Elem x xs and corresponding getting the value out of this Elem! In the context of returning True this change seems to be appropriate, where in the general case the order of pattern-matching in a function is significant, whereas the order of generations of constructors in corresponding type is not. This approach is appropriate only for functions with non-overlaping clauses.

Update derivation documentation

Documentation compiles okay, but we do a lot of things differently now comparing to how it is described there.

For example:

  • we don't generate oneOf for a single element list;
  • we balance recusrive and non-recursive data constructors with frequency;
  • we've solved some problems that are described there;
  • README.md
    • we don't need %language ElabReflection mentioning;
    • we don't need a special compiler version;
  • maybe, something more.

Support collection of a model coverage during generation

It would be good to collect some information about model coverage, i.e. coverage in terms of the original data type (for derived generators) and/or coverage in terms of alternatives inside a generator.

The topic is not univocal, there are definitely a ton of ways how such coverage can be gained and how can it be represented. We'd like to increase slowly starting from collecting just which types and type families were present during a series of generation disregarding the position of this generation, say, in recursion; then we can continue with adding information about which particular constructors were actually used in generation in particular types or type families, again disregarding the position. Then, we can further refine this, say, by considering the position in the generation tree, or by considering coverage of constructors separately for different values of indices for indexed type families.

For the first two steps, there are at least the following steps to be done:

  • We need an elaboration script being given a data type family name, showing the list of all data type families that are used inside. This is needed to list all possible items of a coverage of type families (and their constructors).
  • A data structure indexed by a generation tree (i.e., a generator), which shows which alternative was used to generate this or that data value. This data structure would be returned by an alternative variant of unGen. Having additionally description values of oneOfs, using this indexed data structure, we can at least know which data type families were tried to generate a value.
  • Then, we can extend descriptions of oneOfs to contain particular information about particular constructors. It would allow us to evaluate model coverage more precisely. This all, of course, applied only to derived generators, and to generators, whose authors added descriptions corresponding to the derived ones.
  • Maybe, we need some structure on the descriptions, i.e. descriptions would be not just Maybe String, but some our custom FromString type, which may contain type itself, and, possibly, a Vect of subdescriptions for each alternative.
  • Maybe, we could invent some kind of a %macro, which can fill appropriate description of a hand-written generators.

Add support for type-polymorphic data structures

Currently derivation of generators with dependent types is supported, e.g.

gf : Fuel -> (n : Nat) -> Gen MaybeEmpty $ Fin n
gf = deriveGen

But derivation of generators for data types that are polymophic over types is not.
E.g., this does not work:

gl : Fuel -> (a : Type) -> (Fuel -> Gen MaybeEmpty a) => Gen MaybeEmpty $ List a
gf = deriveGen

The whole task can be divided to at least these subtasks:

  • Support for non-higher kinded given type arguments in covariant positions
  • Preferring given type arguments in the ordering in the least-effort constructor derivator in covariant positions
  • Support for non-higher kinded generated type arguments
  • Support for higher-kinded type arguments
  • Support for type arguments in the contravariant positions

Different errors are shown depending on the way derivation algorithm is passed

Steps to reproduce

Consider derivation task that should result in error, say derivation for a type with no constructors, say Void.

Consider two ways of doing the derivation and passing the desired derivation algorithm:

  1. through %hint, e.g.:

    %hint      
    UsedConsDerivator : ConstructorDerivator      
    UsedConsDerivator = LeastEffort      
       
    gv : Fuel -> Gen Void      
    gv = deriveGen
  2. through auto-parameter:

    gw : Fuel -> Gen Void      
    gw = deriveGen @{MainCoreDerivator @{LeastEffort}}

Expected behaviour

Two ways fall with similar errors with comprehensive message.

Observed behaviour

The first one fails with understandable and relevant message:

Error: While processing right hand side of gv. Error during reflection: No constructors found for the type `Builtin.Void`

X:22:6--22:15
 18 | UsedConsDerivator : ConstructorDerivator
 19 | UsedConsDerivator = LeastEffort
 20 | 
 21 | gv : Fuel -> Gen Void
 22 | gv = deriveGen
           ^^^^^^^^^

But the second one fails with irrelevant message:

Error: While processing right hand side of gw. Error during reflection: No arguments in the generator function signature, at least a fuel argument must be present

X:18:6--18:15
 17 | gw : Fuel -> Gen Void
 18 | gw = deriveGen @{MainCoreDerivator @{LeastEffort}}
           ^^^^^^^^^

Incorrect implicit value is supplied to a constructor leading to a unification error

Steps to reproduce

Consider the following module which asks for derivation of a generator:

import Test.DepTyCheck.Gen.Auto
import Decidable.Equality

%default total

%language ElabReflection

%hint
UsedConstructorDerivator : ConstructorDerivator
UsedConstructorDerivator = LeastEffort

data X : Type where
  MkX : Nat -> Bool -> X

data Y : X -> Type where
  MkY : Y (MkX n b)


data Z : Type where
  MkZ : (x : X) ->
           Y x =>
           Z

data W : Z -> Z -> Type where
  MkW : W (MkZ (MkX n False)) (MkZ (MkX n True))

genW : Fuel ->
            (a : Z) ->
            (b : Z) ->
            Gen $ W a b
genW = deriveGen

Expected behaviour

All derives and typechecks OR a comprehensible error message giving a hint on what is wrong.

Observed behaviour

Error: While processing right hand side of genW. Error during reflection: While processing right hand side of $resolved11799 <Main.W>[0, 1]. While processing left hand side of $resolved11799,<Main.W>[0, 1],<<Main.MkW>>. When unifying:
    MkX ?n True
and:
    MkX ?to_be_deceqed^^n0 True
Pattern variable to_be_deceqed^^n0 unifies with: ?n [locals in scope: ^outmost-fuel^, a, b, {arg:926}, {arg:923}, ^fuel_arg^].

Suggestion: Use the same name for both pattern variables, since they unify.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.