neighbour-hoods / rep_lang Goto Github PK

View Code? Open in Web Editor NEW

21.0 8.0 4.0 295 KB

a language for Cultural Articulation

License: Other

Nix 6.34% Rust 92.25% Ragel 1.41%

holochain programming-languages interpreters hindley-milner rust

rep_lang's People

Contributors

Stargazers

Watchers

Forkers

copoesis socioprophet mhuesch socios-linux

rep_lang's Issues

High-level sketch of the problem space

Putting this here to sumarise and collate initial conversations between @mhuesch @sidsthalekar & myself. The intention is to draw a set of specific technical requirements and interdependencies out into issues for project managing; in order to distill an MVP and general sense of feature-priority.

The basic set of functional requirements at this stage are for a language which supports (safe) arbitrary execution of MapReduce queries against a set of normalised source data. There are few components we see as necessary to achieve all of this, some with clearer outlines than others.

The Reputation Vault

An attempt at revising and formalising some terminology (key terms in bold).

What is the Reputation Vault, from a technical perspective?

A Holochain DNA composed of at least 3 cooperating zomes.
The first zome stores reputation calculation documents, authored using Sacred Capital's Reputation DSL; a.k.a. "a formal economic language for culture".
The second zome stores reputation data as normalised, strongly typed datasets which all have unit types to account for the ways in which they are quantified.
The third zome executes stored reputation calculations against source reputation data to derive aggregate reputation scores about the individuals and records in the associated input reputation data.
- It is important that this is the only means by which any reputation scores can be generated in the system. Such an architecture requires agents to consent to storage and acceptance of any reputation calculations that they might make against (and for) others.
Other zomes which will likely be necessary are:
- Those for facilitating remote reputation computation via Holochain capabilities; i.e. asking your friends for their sense of a third party. Relevant as a DNA membrane access control mechanism.
- Those for sharing reputation calculations with others, and for replicating them as 'cultural configurations' within Neighbourhoods. All of the above zomes would deal with local (private) entry storage only; except for this module, which would allow reputational 'lenses' to be replicated outward to peers.

Data normalisation & typing

Data normalisation in the Reputation Vault DNA happens at insertion time, by way of cooperating DNA modules authored to target particular external data sources (eg. likes, REA economic events). Their role is as pluggable content adapters— each takes a list of denormalised input data and normalises it into a native reputation data format and unit type within the vault.

For REA and other economic data, REA will likely be the internal representation used within the Reputation Vault due to its design as an upper ontology. For user-generated metadata (eg. "likes"), the normalised unit type becomes the type of the data represented. Unit types become "likes", "votes" etc. In an ideal implementation, all unit types are preserved in calculations such that COUNT(likes) knows its unit type is "number of likes".

Implement API interface for DSL runtime

Proposed toplevel crate API interfaces from conversations between @mhuesch & @pospi -

Compiler

Language parsing and AST introspection

fn parse_calculation(dsl_document: AsRef<str>) -> Result<poly_rs::Expr>
- throws an error if the document doesn't pass type checking
fn get_calculation_input_type(calculation: poly_rs::Expr) -> poly_rs::Type
fn get_calculation_return_type(calculation: poly_rs::Expr) -> poly_rs::Type
poly_rs::Type::TRelated(op, Type, Type) needs to be implemented to support calculated / derived units

Interpreter

Runtime execution of interpreted reputation language expressions against some set of input data

fn reduce_calculation<R>(calculation: poly_rs::Expr, input_data: Iter<poly_rs::Value>) -> ReputationCalculationResult

where

  struct ReputationCalculationResult  {
      calculation: poly_rs::Expr,
      type: poly_rs::Type,
      value: poly_rs::Value,
  }

Internal operation:

At each stage in the computation (each unwinding of poly_rs::Expr)—

poly_rs::Value type is provided by input_data
expected poly_rs::Type of input is determined by analysing the poly_rs::Expr (possibly, but not necessarily with get_calculation_input_type(calculation)), and validated against poly_rs::Value. Runtime returns an error if incompatible.
poly_rs::Type of output is determined by get_calculation_return_type(calculation) and carried through as the input type to next step in the VM execution

At every step in evaluation the runtime by its nature should have awareness of poly_rs::Type and poly_rs::Value, which enables it to infer compatibility of arithmetic operations between different poly_rs::Type with compatible underlying poly_rs::Value types. (eg. metres / seconds = valid m/s because both inputs are doubles / decimals)

investigate ways to reduce `Sto` clutter

an example of troublesome-looking Sto output:

        924 : CellThunk(Ev { _: VClosure(Name("x"), App(App(Prim(Add), Lit(LInt(1))), Var(Name("acc"))), {Name("_1"): VRef(0), Name("acc"): VRef(923), Name("foldl"): VRef(3)}) })
        925 : CellThunk(UnevExpr { expr: App(Prim(Head), Var(Name("xs"))), env: {Name("_1"): VRef(0), Name("acc"): VRef(118), Name("f"): VRef(116), Name("foldl"): VRef(2), Name("xs"): VRef(120)} })
        926 : CellRedirect(VRef(110))
        927 : CellThunk(Ev { _: VClosure(Name("x"), App(App(Prim(Add), Lit(LInt(1))), Var(Name("acc"))), {Name("_1"): VRef(0), Name("acc"): VRef(926), Name("foldl"): VRef(3)}) })
        928 : CellThunk(UnevExpr { expr: App(Prim(Head), Var(Name("xs"))), env: {Name("_1"): VRef(0), Name("acc"): VRef(110), Name("f"): VRef(108), Name("foldl"): VRef(2), Name("xs"): VRef(112)} })
        929 : CellRedirect(VRef(102))
        930 : CellThunk(Ev { _: VClosure(Name("x"), App(App(Prim(Add), Lit(LInt(1))), Var(Name("acc"))), {Name("_1"): VRef(0), Name("acc"): VRef(929), Name("foldl"): VRef(3)}) })
        931 : CellThunk(UnevExpr { expr: App(Prim(Head), Var(Name("xs"))), env: {Name("_1"): VRef(0), Name("acc"): VRef(102), Name("f"): VRef(100), Name("foldl"): VRef(2), Name("xs"): VRef(104)} })
        932 : CellRedirect(VRef(95))
        933 : CellThunk(Ev { _: VClosure(Name("x"), App(App(Prim(Add), Lit(LInt(1))), Var(Name("acc"))), {Name("_1"): VRef(0), Name("acc"): VRef(932), Name("foldl"): VRef(3)}) })
        934 : CellThunk(UnevExpr { expr: App(Prim(Head), Var(Name("xs"))), env: {Name("_1"): VRef(0), Name("acc"): VRef(95), Name("f"): VRef(93), Name("foldl"): VRef(2), Name("xs"): VRef(97)} })
        935 : CellRedirect(VRef(87))
        936 : CellThunk(Ev { _: VClosure(Name("x"), App(App(Prim(Add), Lit(LInt(1))), Var(Name("acc"))), {Name("_1"): VRef(0), Name("acc"): VRef(935), Name("foldl"): VRef(3)}) })
        937 : CellThunk(UnevExpr { expr: App(Prim(Head), Var(Name("xs"))), env: {Name("_1"): VRef(0), Name("acc"): VRef(87), Name("f"): VRef(85), Name("foldl"): VRef(2), Name("xs"): VRef(89)} })

we see a lot of duplication and (most concerning to me), it appears as though currying-by-default means we cannot strip down the env of a closure.

this comes from a length computation across a large list:

defn foldl
 (fix (lam [foldl]
   (lam [f acc xs]
     (if (null xs)
       acc
       (foldl
         f
         (f acc (head xs))
         (tail xs)))))))

(defn length (foldl (lam [acc x] (+ 1 acc)) 0))

if we had an uncurried language, we'd be able to see that (lam [acc x] (+ 1 acc)) has no free variables and therefore doesn't need an environment. instead, because we are currying, that represents two lambdas, and the variable is "free" inside the body of the inner lambda, because it was bound by the outer (or maybe I have the order switched up).

this makes me think that currying-by-default is worth reconsidering.

add proper option parsing to the various executables

this would allow us to add a --verbose/-v flag which could emit debug info. most of the time, we probably don't want to see that.

rle emits a potentially gigantic printout of the Sto at the end of an evaluation. we probably only want that when it's necessary.

in rli, we could support toggling verbosity (and perhaps other things) inside of the REPL, so that the whole session wouldn't have to be dumped in order to switch.

Proof execution against custom types provided to the interpreter

This could form part of the test suite for #5 as a first pass.

Refer to #5 (comment) for ideas but there are a few implementation options to consider.

At a low level there are two user stories to fulfill here-

As an integrator of the DSL, I want to be able to define a new reputation datatype and its structure in my own third-party code.
As an integrator of the DSL, I want the interpreter to throw an error if the wrong datatype is used as an input to a reputation calculation

Proof multi-step computations

integration test for calculating an AVG() by first calculating SUM and COUNT before dividing one by the other

investigate Haskell’s suitability

N.B.

this is further-future, not something to act on yet.

synopsis

right now, (I believe) Holochain only has an HDK for Rust.

it seems possible that Haskell might be useful to us, because of easy access to things like type classes and higher-order functional constructs. instead of having to implement a programming language in Rust, it’s possible we could do an embedded DSL in Haskell, which would allow us to use Haskell’s type system and programming constructs (e.g. monads) as a “shallow embedding”.

the idea here is to scope the work required to use Haskell for the reputation DSL.

additional motivation that is pretty subjective/personal

I @mhuesch am better with Haskell than Rust, so dev work might go significantly faster 🙂

main areas of work needed

Haskell interpreter / embedded DSL architecture

the need:
- we need something which can be compiled to WASM, which can interpret user-provided programs.
- we would like to have higher-order constructs & perhaps things like monads
- why not use Haskell itself?
- thus we would use hint to interpret Haskell code. we would compile a WASM binary which contained hint and thus could interpret Haskell code inside of Holochain.
as I demonstrated here we could use hint to load Haskell modules and then interpret their contents.
- this would allow us to load Haskell modules out of a Holochain DHT, then pass them to hint to parse/typecheck/interpret.
the interface might look like:
- we (authors of DSL) define machinery which handles validation, type marshalling, interoperation with Reputation Vault, etc.
- users would provide reputation computation programs, as Haskell modules which define certain specially-named functions (we, authors of DSL, would decide these names and communicate them).
- hint would interpret user-provided modules, in conjunction with DSL-core modules, in order to create & execute the reputation programs with "live" Holochain data.

WASM compilation of Haskell

Tweag have asterius which compiles Haskell -> WASM. it's a modified version of GHC. I have not yet played with it much, besides failing to build this branch on NixOS.

Holochain HDK implementation (ABI compatibility with Holochain)

this is maybe the biggest question mark? Holochain has been moving fast and I'm not sure of stability of APIs with RSM.

we would need to write an HDK for Haskell, which would require some fairly understanding of Holochain's core and how to interface between zomes written in different languages.

overall

to me the question is whether the cost of supporting Haskell on Holochain (building HDK, getting WASM compilation working) is worth the payoff of having Haskell's power. I'm not sure what the answer is.

implement incremental lists via fusion & stepped sub-expression interpretation

this ticket sketches out a possible approach for interpreting incremental lists in a principled way.

it supersedes #12.

motivation for fusion

we want to be able to define functions which process potentially large quantities of data.

for example:

(defn mean
  (lam [xs]
    (/ (foldl + 0 xs)
       (foldl (lam [acc x] (+ acc 1)) 0 xs))))

suppose we want to apply mean a large number of values - enough to cause an out of memory error on the machine we are using if all values are initially loaded into memory. in principle, there is no reason to load all values - both folds could be interpreted in constant space, as all they have to maintain is an integer accumulating parameter.

now take another example:

(defn sqSum
  (lam [xs]
    (foldl + 0 (map (lam [x] (* x x)) xs))))

we would also like this function to run in constant space, and be able to process OOM-inducing volumes of values. however, a naive interpreter would insist on producing the incremental list of "squares" (the output of map) before feeding that to foldl. so the intermediate list would trigger an OOM and ruin our fun.

now consider a rewritten sqSum which processes everything "all in one go" and thus avoids the OOM-y intermediate list:

(defn sqSum
  (lam [xs]
    (foldl (lam [acc x] (+ acc (* x x))) 0 xs)))

this definition represents a "fusion" of the separate foldl and map of the previous definition.

goal of fusion

allow the programmer to write arbitrary pipelines of list-processing operations, and automatically combine them into (ideally) single foldls, which can be incrementally interpreted across large datasets.

motivation for "stepped sub-expression interpretation"

(working name, subject to revision, open to suggestions)

let's revisit mean:

(defn mean
  (lam [xs]
    (/ (foldl + 0 xs)
       (foldl (lam [acc x] (+ acc 1)) 0 xs))))

how would we interpret this, incrementally, over a massive dataset?

sketch of implementation approach for "stepped sub-expression interpretation"

in the above example, we really mean interpreting the body, in an environment where xs is understood to not be a normal list (represented in memory) but an iterator-stream-incremental-thing, which we can pull one value out of at a time. we want to interpret:

(/ (foldl + 0 xs)
   (foldl (lam [acc x] (+ acc 1)) 0 xs))

of note, we are performing two folds across the same list. do we need to interpret things separately and thus run the risk of duplicating calls to the outside world to provision the input values? it is likely out of scope for now, but we could perform automated detection of shared data, and interpret multiple subexpressions which consume the same data simultaneously - in this case we would interpret both foldls at the same time, feeding in each value of xs to each.

let's leave the sharing optimization aside for now. if we are ok with duplicating the "fetching" work, we could take the following approach:

(note: this part is rough)

capture the environment of (foldl + 0 xs)
capture the environment of (foldl (lam [acc x] (+ acc 1)) 0 xs)
create freshnames to represent the two folds and substitute them into the "toplevel program": (/ var_0 var_1)
interpret var_0 to a value (using the captured environment from (1))
interpret var_1 to a value (using the captured environment from (2))
substitute those into the original toplevel environment and interpret (/ var_0 var_1)

by chunking up the program into subexpressions which process lists, and interpreting those incrementally, we can then take their outputs and interpret the remaining program which depends on those values.

requirement of fusion for maximal usefulness of "stepped sub-expression interpretation" (SSEI)

note that we do need fusion to interpret the above example of mean. both foldls are in position to be interpreted incrementally.

however, the original sumSq from above is not a "single fold" - it has an intermediate list which would cause us to OOM on a large dataset.

if we were to fuse sumSq to the collapsed "single foldl" version above, we would then be able to interpret it using the approach from the previous section. it would only have 1 subexpression of interest, and no "remaining program" to interpret after that was finished.

non-necessity of fusion for early prototyping / early adopters

note that we do not need fusion for the SSEI system to be usable for complex programs. however it will be up to the programmer to manually do the work of transforming programs into "single foldl"s (as in the case of sumSq above). for small programs this may be fine, but for larger programs it may be quite tedious & error-prone.

for the SSEI system to work, all we would need is a program which is a closure/function, which receives a number of list input arguments, and performs "single foldl"s on them, reducing them to "atomic" values (i.e. non-lists).

transition off forked rustyline once wasm support is merged mainline

kkawakam/rustyline#620 adds wasm support, and kkawakam/rustyline#604 maybe also does.

once one of them merges, update our code to use mainline rustyline again, instead of the fork-dep introduced in #77.

concrete syntax: support non-letter characters in names

right now the name (variable) parser only allows letters.

we probably want to support _, -, and numbers, at minimum.

global: bump package versions to latest

we're now a bit dated in a number of packages.

revamp `cargo2nix` workflow in light of no toplevel workspace

on branch drop-workspace I have dropped the toplevel Cargo.toml workspace, because of a conflict with rep_interchange's toplevel one (Cargo insists there can only be one).

we had previously used Cargo.nix in the root of the repo, built from running cargo2nix in that root dir.

now we are running with 3 separate packages in the respective subdirectories. we'll have to adapt (I think) to having 3 different Cargo.nix files, generated from each respective Cargo.toml in each subdir.

add `head` and `tail` primitives to facilitate easy (tho unsafe) list processing

head takes a list and returns the first element (or errors if the list is null).
tail takes a list and returns the all but the first element (or errors if list is null).

this is less principled and less safe than adding case analysis (see #18), however it is way easier.

add support for user-defined datatypes

implement integer division

long overdue - this would allow us to compute a mean (average), for example.

example program:

(defn mean
  (lam [xs]
    (/ (foldl + 0 xs)
       (foldl (lam [acc x] (+ acc 1)) 0 xs))))

(mean (list 1 2 3 4))

convert CI to use nix

currently we use a mix of:

nix to provision dev environment
nix to build (via Cargo.nix)
Github Actions for CI

for implementing #36, it would be significantly easier (and less likely to introduce future bugs/issues) if we ran CI based on the same, single nix derivation set.

this perhaps means we can't use Github Actions - I'm not sure yet. setting up hydra might be nice, tho is a larger task.

abstract / concrete syntax interface invariant checking

I want to note an issue with interfaces - if folks define their own parsers (via their own concrete syntax crate, an ability which we intend to support later) then we will need to define some invariants for the construction of abstract syntax trees. for one, avoiding reserved keywords. if we expose the Expr type, then "invalid" names could be inserted into it.

one way around this would be to have a function fn check_expr(expr: &Expr) -> bool which runs checks on whatever expr is passed in from a concrete syntax module, and asserts that it works with the runtime/abstract syntax machinery.

Proof execution of custom type compatibility / incompatibility

Two different custom datatypes should be declared (eg. "likes" and "upvotes")
A calcuation expecting "likes" should error when provided "upvotes"
An error should be thrown if "likes" and "upvotes" data is computed against each other

set up a CI checks on PRs

we have some limited tests that can be run via cargo test. it would be nice to run those on PRs as a check before merging.

potentially linting via cargo fmt as well.

I am partial to Nix for doing build-y stuff, but we could also go pure-cargo and that ought to work too.

add automated checks to CI

nice to have:

run cargo2nix and check for a zero diff
- this will ensure the file stays fresh
run cargo build and check for a zero diff

relates to neighbour-hoods/social_sensemaker#14

support loading of source files in `rli`

akin to :l <filename> in ghci (https://ghc.readthedocs.io/en/8.0.1/ghci.html#loading-source-files), this would help with loading files and tinkering with them interactively.

this would be made more-legit by having a module system.

investigate whether garbage production can be minimized

migrating a todo from #40 to an issue here:

diagnose cause of duplicated entries in sto_vec and ideally remove them

these two match blocks create some of the extra garbage. I am not sure how to eliminate them at present, though.

they represent the notion that "interpretation of a thunk produces intermediate store allocations, including the allocation of the result of evaluation, which is itself stored in the store. thus at the end, we perform a copy of the result, setting the unevaluated thunk to an evaluated thunk which wraps the value.

possible ideas

the approach demonstrated in https://github.com/sacredcapital/rep_lang/tree/sto_cell_redirect seems quite viable and like it would close this issue.

interpreting `fix` overflows stack

N.B.: it's not actually clear to me we want to implement fix (or that it is possible to do in Rust, at least without a significant rework of the term environment? but I'm not sure) because we may not want to allow arbitrary unbounded recursion.

description

$ cat ./examples/ex6.poly
(defn fac
  (fix (lam [fact n]
         (if (== n 0)
             1
             (* n (fact (- n 1)))))))

(fac 7)

$ cargo run --bin poly ./examples/ex6.poly
    Finished dev [unoptimized + debuginfo] target(s) in 0.01s
     Running `target/debug/poly ./examples/ex6.poly`
Scheme([], TCon("Int"))

Env({Name("fac"): Scheme([], TArr(TCon("Int"), TCon("Int")))})

## potentially relevant leads

https://github.com/andrejbauer/plzoo/blob/master/src/poly/interpret.ml#L78-L80
^ this uses a `rec` syntactic form which holds the name (which will be recursively bound) and the expr to which it will be bound. that seems easier than our approach, which just checks that the `Expr` enclosed in a `fix` call is a function type, and passes it to itself. since we don't know what name the inner function will bind itself to, it's difficult to do a fixpoint-y substitution on the interpreter environment. we could consider switching to the `rec` form.

lazy evaluation in Rust: https://doc.rust-lang.org/core/lazy/struct.Lazy.html

I think that the main "trick" we need is a bit of laziness to allow the knot to be tied. I'm not sure how to wire that into the interpreter guts.

thread 'main' has overflowed its stack
fatal runtime error: stack overflow
[1]    18277 abort (core dumped)  cargo run --bin poly ./examples/ex6.poly

this is presumably because my very-literal transcription of the tutorial implementation was too literal and overlooked Haskell's laziness and Rust's eagerness of evaluation.

migrated from mhuesch/poly-rs#16.

add `fix` back to the language

fix was removed in #26 because of #17. the issue there was that the strict interpreter entered an infinite recursive loop by expanding fix to an expression which contained another fix. because of strict evaluation, this would loop forever.

in the lazy interpreter, my hunch is that we can re-add fix pretty much exactly how it was before, and things will work. this is because in lazy evaluation the recursive evaluation of the "inner" fix won't occur until the recursive use of the fixpoint-bound function is forced. I think this means we're ok.

rough notes / progress log

I am creating this to post updates on what I'm looking at or thinking about, in a way that's not super refined. so reader beware!

investigate lazy evaluation as a means of implementing incremental list processing

this is an alternative approach to #25, and it seems like it might be considerably cleaner.

it will be implemented by a succession of PRs, as explained in the first: #33

feature: record syntax ideas / scopedlabels

extensible records would be 🔥

we currently have product types, with Pair (2-tuples).

the approach outlined in this paper might lead to a nice mapping between underlying Rust structs (other data in Holochain) and types in the poly-rs language.

a possible sketch.
. would be a the field accessor syntax, in keeping with Lisp-y style.

((lam [r]
    (. r field_x))
 {[field_x 1]
  [field_y 2]
  opt_expr_for_record_to_extend})

misc related-ish links

https://gist.github.com/nikita-volkov/6977841
https://www.reddit.com/r/haskell/comments/1oicmu/anonymous_records_a_solution_to_the_problems_of/ (<- some criticism of the approach ⬆️)

migrated from mhuesch/poly-rs#14.

Demonstrate reputation calculation over Holochain data

Integrate the reputation DSL with some REA event data coming from a Holo-REA module and make calculation execution workable

strategize about implementation for `Type::TRelated`

try to figure out what a good structure might look like.

relates to #6.

import `poly-rs` repo into this repository; rename crates

import plan

creating a separate issue for this as it seems distinct from #8, but is a prerequisite for it.

current poly-rs sits over on my account. my suggested plan is:

once I complete the work on toplevel-api over on the other repo, to push the repository from there to this repository. e.g:

cd /tmp
git clone https://github.com/mhuesch/poly-rs
cd poly-rs
git remote rm origin
git remote add origin [email protected]:sacredcapital/ReputationDSL.git
git branch -M main
git push -u origin main

that would import all files and the whole git history.

then we can transfer (probably manually, since it doesn't seem possible to import issues across organizations) the issues from the poly-rs repo over here.

naming

what should we name the cargo crates for this repo?

I like short names so r_dsl could be nice, but it is a total acronym. we would also want a separate name for the parser (as part of #8).

test-suite feature: generate well-typed `Expr`s

this is a little tricky.
two approaches come to mind:

generate terms using the untyped generator, call infer them on them, & discard not-well-typed ones.

seems inefficient - we will probably rarely generate a well-typed Expr of any non-miniscule size.

define a new type-directed generator.

seems hard, but doable.
here we might generate a Type first, and then use that to guide generation of an Expr which fits the type.
I imagine there's research literature about doing this, but I am currently ignorant of it.

migrated from mhuesch/poly-rs#13.

investigate / scope adding a module system

we could perhaps take inspiration from OCaml's module system (https://dev.realworldocaml.org/first-class-modules.html).

we will likely need to parameterize the module system by the module identfier, so that the language can be used both:

in a regular filesystem-based context (i.e. where modules are "implemented" using files & directories, where an identifier is a filepath)
and in Holochain (i.e. where modules are "implemented" using Holochain Entrys, which are identified using EntryHashes)

the latter use represents a kind of "agent-centric" language, where code is not stored in a "data centric" central repository with a global naming scheme. as such it might be kind of novel.

Implement measures for memory efficiency / bounding on large datasets

concrete syntax: add line/col numbers to parse errors

position (line/column) of parse errors
[no] ... more helpful output?

I'm just going to scope this to line/col numbers.

fancier error messages require significantly more thought and probably can be deferred until a while later.

feature: sum types & match/case syntax

I think match/case is necessary for implementing sum types.

how else can you do case analysis?

(match r
  [ (L l_body) ((+ 1) l_body) ]
  [ (R r_body) 7 ])

^ we might not even have to do any deeper pattern matching - 1 level only?

migrated from mhuesch/poly-rs#15.

create API interface tests to target "incremental lists"

as background:
it is always nice to have more tests.
this project could use more tests.
specifically for the toplevel API.

but to scope the issue and make it complete-able, here we'll focus just on arguments (to the toplevel) of shape "List Value where the list is quite long".

since we currently store all lists fully in memory, this will potentially cause OOMs.

procedure

define a test which features a poly-rs program like so:

(defn sum (foldl + 0))
sum

(meaning that we have a definition and then a reference to it in the program body).

then feed a gigantic list in through input_data in the toplevel API.

try to limit the Rust runtime heap, so that it will OOM.

this will give us a bar for failure; next we can work on passing the test by defining a proper iterator-based incremental list.

Automatic conversions between units using TCons to represent OM-typed data

Rather than implementing OM itself in Rust (something akin to Dimensional in Haskell), a simpler (but much less performant) option may be to embed an RDF reasoner that leverages the OM RDF ontology directly.

See https://github.com/HajoRijgersberg/OM#libs for some implementations of OM unit conversions in other languages.

Proof single-step computations

integration test for calculating the SUM() of a set of values

add basic boolean operators

https://en.wikipedia.org/wiki/Boolean_algebra#Basic_operations:

(: and (Bool -> Bool -> Bool))
(: or (Bool -> Bool -> Bool))
(: not (Bool -> Bool))

could add the secondary operations too, or just implement them in stdlib: https://en.wikipedia.org/wiki/Boolean_algebra#Secondary_operations

Multi-crate refactoring

We want the parser to be separate from the runtime so that others can implement their own parsers.

potential optimization: look for fully applied lambdas and substitute them all at once

motivation

our interpreter currently creates unnecessary closures.
this may or may not be a big deal.
I'd say it seems likely that other issues are more pressing than interpreter performance.

since this is semi-fresh in my mind, I'm recording my thoughts here in case we want to investigate this in the future.

core issue

currently if we have ((lam [x y] (+ x y)) 1 2), we will create 2 closures.
we'll interpret the first lambda to get a VClosure.
then we'll interpret the application of the lambda, do a substitution with 1 for x into the body, then get out another VClosure.
then we will interpret that body with 2 substituted for y, and be done.
so we will do 2 packaging-up-s of closures, and 2 substitutive interpretations.
in this case, no closure is actually needed to evaluate the arguments or the closure body (since 1, 2, and (+ x y) have no free variables - x & y are bound by the lambda).

this seems like a waste, when we could just substitute both at once, and not package up either closure.
IIRC, this is a safe optimization, but I can't remember the name for it...

I think they key is determining whether the arguments to the function applications have any free variables.
and perhaps also determining what variables are returned...

next action

try to find this optimization, perhaps in a textbook about interpreters. seems much easier to find it than to try to reinvent the optimization.

potential approach to testing

generate well-typed Exprs (see #21).
evaluate them to Values, using the unoptimized interpreter and the optimized interpreter.
assert that both versions produce the same output.
derive equality on Value (possible??) in order to decide equality.
this should be doable given the current definition of Value, which would be a strong form of equality (I forget all the names for different equality-s, but this is a very literal form, where we would ideally just derive Eq).

after demonstrating equivalence, using 2 modules, replace the unoptimized module with the optimized one.
OR, hide the optimization behind some feature flag / optimization level.

migrated from mhuesch/poly-rs#12.

implement garbage collection

the Sto introduced in #35 is our "heap" - where much of rep_lang's allocated memory lives. much of this memory might go out of scope and not be needed, as evaluation of a program progresses.

in order to evaluate large programs, or ones where the amount of processed data exceeds machine RAM/swap (which is a required goal of the language), we will need to be able to collect garbage as we go.

implement "incremental lists"

currently:

#[derive(Clone)]
pub enum Value {
    ...
    VList(Vec<Value>),
    ...
}

change to

#[derive(Clone)]
pub enum Value {
    ...
    VList(Iterator<Item = Value>),
}

figure out what places in the codebase clone Value
figure out how to clone iterators (is it possible?)

add numeric inequality operators

(: < (Int -> Int -> Bool))
(: > (Int -> Int -> Bool))

optionally, <=, >=, although those can be implemented in stdlib using the above primitives, and then boolean primitives (see #44)