A lightweight, brute-force framework for discrete code-based experimentations.
Sometimes, one wants to run an algorithm, a model, a simulation, ... with several combinations of parameters, in order to get insights on its behaviour. Or to find such a combination that maximizes some key figure. In many such cases, trial and error is time-consuming. Without tooling assistance, one often loses track of which combinations were tested, and what is the next parameter to tune up or down for the next scenario.
Basile is a such a tool. It doesn't do much but it's very light and simple to use. Also, it does not depend on a particular framework (e.g. Scikit-Learn's grid search), it is pure Python with no library dependency. It can be used in almost any situation where combinations of discrete parameters must be explored in a systematic way.
There are two important words in the above sentence:
- discrete: each parameter may take a finite number of values, Basile will not sample from continuous intervals. If needed, invervals may can be discretized in any fashion, but it's your job
- systematic (with combinations): Basile is not an optimizer, in that it won't guess the next most promising combination based in the outcome of the previous one. Nor does it pick values at random: it will try every single combination, predictably
The package can be installed with pip
:
pip install basile
Here is a showcase of the possibilities:
from dataclasses import dataclass
from basile import Either, count_realizations, realize_template
@dataclass
class Level2:
field1: str
@dataclass
class Level1:
field1: int
field2: list
field3: Level2
template = Level1(
field1=1,
field2=Either([2, 3, 4], [4, 5, 6]),
field3=Level2(field1=Either('abc', 'def', 'ghi'))
)
n = count_realizations(template)
for i, concrete in enumerate(realize_template(template), 1):
print(f'{i}/{n}', concrete)
The two classes Level1
and Level2
are representations of our parameters. We see that Level1
has a field with type Level2
,
showing that our substitutions can happen in many places, including down in an object hierarchy.
Then we define a template. It is a plain object (Level1
here), only with instances of Either
objects here and there
where we want our substitutions to take place. The template could as well be a list, a tuple or a dictionary, it does not matter,
and in particular there is no Template
type imposing constraints on the code expecting parameters.
The candidate values inside the Either(...)
objects can be anything; the only limitation is that nested candidates are not
supported (Either(1, 2, Either(4, 5)))
is not ok and would not make much sense anyway).
Then we count the number of possible realizations and put it in a variable, to display it in the loop below.
Last comes the loop, where we iterate over the combinations of parameters with the realize_template()
function. The latter
will substitute parameter values in a deep copy of the template, yielding a fully baked object, concrete
(the deep copy
operation is performed at each iteration).
The concrete
object is of type TemplateRealization
. It has two attributes
specification
, which holds a dictionary of substitutions (say, for logging purposes)- keys are access paths to each candidate in the template (see concepts below for more explanations)
- values are the actual substitutions
realization
, which holds an object with the same structure as the template, with no moreEither
s in it. This is the item we would use in the core of our application
Note that Basile only works with iterators, it never materializes combinations in lists or other memory demanding structures. This makes it suitable for scenarios with high cardinalities leading to combinatorial explosion. Process time would certainly be a concern at this point, but not memory.
As mentioned above, Basile does little in itself, and it's quite dumb. Here are some recipes for more elaborate needs.
Often, not all combinations of parameters are admissible. For example, in our toy scenario above, field2 = [2, 3, 4]
could be valid only for field3.field1 in ('abc', 'def')
, and field2 = [4, 5, 6]
for field3.field1 in ('def', 'ghi')
.
However Basile will happily generate a cross product of all candidates, including the invalid combinations.
In such a case, we would inspect the concrete parameters (concrete.realization
), and skip loop iterations when a
certain business rule is not satisfied. Or, better, apply Python's built-in filter()
function to the result of
realize_template()
using a predicate.
When the number of combinations is large, and when each iteration takes time to complete, one might not want to restart the loop from scratch if the program was interrupted in the middle. Basile is 100% deterministic so it's easy to exploit that.
There are many ways to record what combinations were worked out already; here is a hacky yet simple one:
- convert
concrete.specification
(a dictionary) into a sorted tuple of tuples- it will be used to look up a set, so it must be fully hashable
- append it to a file, in any suitable format, after the processing of
concrete.realization
is complete (this is like committing the iteration) - when starting the program, load the file into a set of specifications, and use it to look up the specifications that must be skipped from the loop
Even if the template is altered between executions, with new combinations added or their order changed, there will be
no duplicate execution of a given combination. It won't work though if a new occurrence of Either
is added to the
template, because all specifications will have a new entry, in comparison to the "old" ones loaded at program
initialization.
Of course, if the processing logic changes, it makes sense to start it over again, by deleting the tracking file.
E.g. if candidate values are open files, or database connections... There is no provision, in the API, to dispose of a template realization when the calling program is done with processing it. If it contains values that must be closed or released in order to preserve resources on the machine, you have to do it yourself explicitly. Note that because of the combinatorial nature of the work, each resource will typically be seen by many scenarii, and may not be closed until all iterations are complete.
While the API itself is simple, the inner workings are more complex and can use a bit of explaining. Here is a description of all the concepts used in ExpBasileyriment.
Candidates are possible values for a parameter, an argument... of your program. They are specified using the
Either
object as a placeholder for final values. Those values can be anything, except that they cannot
hold other Either
objects directly or indirectly (no candidates nesting). This is not enforced, but your
processing logic would receive Either
object that it may not be able to handle.
In the toy example above, there are 2 candidates, one with 2 values (lists) and one with 3 values (strings).
A template is an object which can be made variable, by substituting values inside it via candidates. In code, the
only thing that makes it a template is the use of Either
instances somewhere down its hierarchy. Apart from
that, the template can be structured in any possible way, with the following caveats:
- when walking the template object to find candidates, Basile will only recurse inside dataclass instances, lists, tuples and dictionaries (any other types will be left untouched)
- there is no cycle detection to prevent it from going into an infinite loop if there are inter-references between child objects and/or subcollections
When walking the template looking for candidates, Basile keeps track of where it is down the hierarchy. This is
materialized by an access path, which is a list of CandidateAccess
access objects holding:
access_type
: the access mode of the containing element -- dataclass, list, tuple or dictposition
: the position of the current item in its container (attribute name for a containing dataclass, index for a list or tuple, key for a dictionary)
In the example above, the access paths of the candidates can be represented as:
'field2'
for the first one- '
field3.field1'
for the second one
If candidates were in a list, say field4=[Either(10, 11, 12), Either(13, 14, 15)]
, there would be 2 access
paths: 'field4.0'
and 'field4.1'
. It would be the same for a tuple.
Conversely, if candidates were in a dictionary, say field5={'a': Either(20, 21), 'b': Either(22, 23)}
, the
access paths would be: 'field5.a'
and 'field5.b'
.
For candidates buried deeper in the hierarchy, there would be more elements in the dot-separated chains. The representations
above with dots are for human readability, they are not involved in Basile's processing logic. However, they will
appear in the specification
field of realizations, as dictionary keys. So using dictionary keys with dots should not
break anything.
When a placeholder is found, its possible values are associated with the access path of the Either
object
representing it. This association is called a binding. Conceptually, thus, a template can be considered as a list
of bindings, plus all the leftover bits that are not candidates (not made variable).
When iterating over the combinations, a binding is given each possible value from its Either
part, in turn. This
is a binding realization.
When all bindings in a template are combined together, they make up its template realization. Practically,
TemplateRealization
objects also contain a specification, recalling what parameter values were substituted and how.
MIT License (see the LICENSE
file).