neighbour-hoods / rep_lang Goto Github PK
View Code? Open in Web Editor NEWa language for Cultural Articulation
License: Other
a language for Cultural Articulation
License: Other
Putting this here to sumarise and collate initial conversations between @mhuesch @sidsthalekar & myself. The intention is to draw a set of specific technical requirements and interdependencies out into issues for project managing; in order to distill an MVP and general sense of feature-priority.
The basic set of functional requirements at this stage are for a language which supports (safe) arbitrary execution of MapReduce queries against a set of normalised source data. There are few components we see as necessary to achieve all of this, some with clearer outlines than others.
An attempt at revising and formalising some terminology (key terms in bold).
What is the Reputation Vault, from a technical perspective?
Data normalisation in the Reputation Vault DNA happens at insertion time, by way of cooperating DNA modules authored to target particular external data sources (eg. likes, REA economic events). Their role is as pluggable content adapters— each takes a list of denormalised input data and normalises it into a native reputation data format and unit type within the vault.
For REA and other economic data, REA will likely be the internal representation used within the Reputation Vault due to its design as an upper ontology. For user-generated metadata (eg. "likes"), the normalised unit type becomes the type of the data represented. Unit types become "likes", "votes" etc. In an ideal implementation, all unit types are preserved in calculations such that COUNT(likes)
knows its unit type is "number of likes".
Proposed toplevel crate API interfaces from conversations between @mhuesch & @pospi -
Language parsing and AST introspection
fn parse_calculation(dsl_document: AsRef<str>) -> Result<poly_rs::Expr>
fn get_calculation_input_type(calculation: poly_rs::Expr) -> poly_rs::Type
fn get_calculation_return_type(calculation: poly_rs::Expr) -> poly_rs::Type
poly_rs::Type::TRelated(op, Type, Type)
needs to be implemented to support calculated / derived unitsRuntime execution of interpreted reputation language expressions against some set of input data
fn reduce_calculation<R>(calculation: poly_rs::Expr, input_data: Iter<poly_rs::Value>) -> ReputationCalculationResult
where
struct ReputationCalculationResult {
calculation: poly_rs::Expr,
type: poly_rs::Type,
value: poly_rs::Value,
}
At each stage in the computation (each unwinding of poly_rs::Expr
)—
poly_rs::Value
type is provided by input_data
poly_rs::Type
of input is determined by analysing the poly_rs::Expr
(possibly, but not necessarily with get_calculation_input_type(calculation)
), and validated against poly_rs::Value
. Runtime returns an error if incompatible.poly_rs::Type
of output is determined by get_calculation_return_type(calculation)
and carried through as the input type to next step in the VM executionAt every step in evaluation the runtime by its nature should have awareness of poly_rs::Type
and poly_rs::Value
, which enables it to infer compatibility of arithmetic operations between different poly_rs::Type
with compatible underlying poly_rs::Value
types. (eg. metres / seconds = valid m/s because both inputs are doubles / decimals)
an example of troublesome-looking Sto
output:
924 : CellThunk(Ev { _: VClosure(Name("x"), App(App(Prim(Add), Lit(LInt(1))), Var(Name("acc"))), {Name("_1"): VRef(0), Name("acc"): VRef(923), Name("foldl"): VRef(3)}) })
925 : CellThunk(UnevExpr { expr: App(Prim(Head), Var(Name("xs"))), env: {Name("_1"): VRef(0), Name("acc"): VRef(118), Name("f"): VRef(116), Name("foldl"): VRef(2), Name("xs"): VRef(120)} })
926 : CellRedirect(VRef(110))
927 : CellThunk(Ev { _: VClosure(Name("x"), App(App(Prim(Add), Lit(LInt(1))), Var(Name("acc"))), {Name("_1"): VRef(0), Name("acc"): VRef(926), Name("foldl"): VRef(3)}) })
928 : CellThunk(UnevExpr { expr: App(Prim(Head), Var(Name("xs"))), env: {Name("_1"): VRef(0), Name("acc"): VRef(110), Name("f"): VRef(108), Name("foldl"): VRef(2), Name("xs"): VRef(112)} })
929 : CellRedirect(VRef(102))
930 : CellThunk(Ev { _: VClosure(Name("x"), App(App(Prim(Add), Lit(LInt(1))), Var(Name("acc"))), {Name("_1"): VRef(0), Name("acc"): VRef(929), Name("foldl"): VRef(3)}) })
931 : CellThunk(UnevExpr { expr: App(Prim(Head), Var(Name("xs"))), env: {Name("_1"): VRef(0), Name("acc"): VRef(102), Name("f"): VRef(100), Name("foldl"): VRef(2), Name("xs"): VRef(104)} })
932 : CellRedirect(VRef(95))
933 : CellThunk(Ev { _: VClosure(Name("x"), App(App(Prim(Add), Lit(LInt(1))), Var(Name("acc"))), {Name("_1"): VRef(0), Name("acc"): VRef(932), Name("foldl"): VRef(3)}) })
934 : CellThunk(UnevExpr { expr: App(Prim(Head), Var(Name("xs"))), env: {Name("_1"): VRef(0), Name("acc"): VRef(95), Name("f"): VRef(93), Name("foldl"): VRef(2), Name("xs"): VRef(97)} })
935 : CellRedirect(VRef(87))
936 : CellThunk(Ev { _: VClosure(Name("x"), App(App(Prim(Add), Lit(LInt(1))), Var(Name("acc"))), {Name("_1"): VRef(0), Name("acc"): VRef(935), Name("foldl"): VRef(3)}) })
937 : CellThunk(UnevExpr { expr: App(Prim(Head), Var(Name("xs"))), env: {Name("_1"): VRef(0), Name("acc"): VRef(87), Name("f"): VRef(85), Name("foldl"): VRef(2), Name("xs"): VRef(89)} })
we see a lot of duplication and (most concerning to me), it appears as though currying-by-default means we cannot strip down the env of a closure.
this comes from a length computation across a large list:
defn foldl
(fix (lam [foldl]
(lam [f acc xs]
(if (null xs)
acc
(foldl
f
(f acc (head xs))
(tail xs)))))))
(defn length (foldl (lam [acc x] (+ 1 acc)) 0))
if we had an uncurried language, we'd be able to see that (lam [acc x] (+ 1 acc))
has no free variables and therefore doesn't need an environment. instead, because we are currying, that represents two lambdas, and the variable is "free" inside the body of the inner lambda, because it was bound by the outer (or maybe I have the order switched up).
this makes me think that currying-by-default is worth reconsidering.
this would allow us to add a --verbose
/-v
flag which could emit debug info. most of the time, we probably don't want to see that.
rle
emits a potentially gigantic printout of the Sto
at the end of an evaluation. we probably only want that when it's necessary.
in rli
, we could support toggling verbosity (and perhaps other things) inside of the REPL, so that the whole session wouldn't have to be dumped in order to switch.
This could form part of the test suite for #5 as a first pass.
Refer to #5 (comment) for ideas but there are a few implementation options to consider.
At a low level there are two user stories to fulfill here-
AVG()
by first calculating SUM
and COUNT
before dividing one by the otherthis is further-future, not something to act on yet.
right now, (I believe) Holochain only has an HDK for Rust.
it seems possible that Haskell might be useful to us, because of easy access to things like type classes and higher-order functional constructs. instead of having to implement a programming language in Rust, it’s possible we could do an embedded DSL in Haskell, which would allow us to use Haskell’s type system and programming constructs (e.g. monads) as a “shallow embedding”.
the idea here is to scope the work required to use Haskell for the reputation DSL.
I @mhuesch am better with Haskell than Rust, so dev work might go significantly faster 🙂
hint
to interpret Haskell code. we would compile a WASM binary which contained hint
and thus could interpret Haskell code inside of Holochain.hint
to load Haskell modules and then interpret their contents.
hint
to parse/typecheck/interpret.hint
would interpret user-provided modules, in conjunction with DSL-core modules, in order to create & execute the reputation programs with "live" Holochain data.Tweag have asterius which compiles Haskell -> WASM. it's a modified version of GHC. I have not yet played with it much, besides failing to build this branch on NixOS.
this is maybe the biggest question mark? Holochain has been moving fast and I'm not sure of stability of APIs with RSM.
we would need to write an HDK for Haskell, which would require some fairly understanding of Holochain's core and how to interface between zomes written in different languages.
to me the question is whether the cost of supporting Haskell on Holochain (building HDK, getting WASM compilation working) is worth the payoff of having Haskell's power. I'm not sure what the answer is.
this ticket sketches out a possible approach for interpreting incremental lists in a principled way.
it supersedes #12.
we want to be able to define functions which process potentially large quantities of data.
for example:
(defn mean
(lam [xs]
(/ (foldl + 0 xs)
(foldl (lam [acc x] (+ acc 1)) 0 xs))))
suppose we want to apply mean
a large number of values - enough to cause an out of memory error on the machine we are using if all values are initially loaded into memory. in principle, there is no reason to load all values - both folds could be interpreted in constant space, as all they have to maintain is an integer accumulating parameter.
now take another example:
(defn sqSum
(lam [xs]
(foldl + 0 (map (lam [x] (* x x)) xs))))
we would also like this function to run in constant space, and be able to process OOM-inducing volumes of values. however, a naive interpreter would insist on producing the incremental list of "squares" (the output of map
) before feeding that to foldl
. so the intermediate list would trigger an OOM and ruin our fun.
now consider a rewritten sqSum
which processes everything "all in one go" and thus avoids the OOM-y intermediate list:
(defn sqSum
(lam [xs]
(foldl (lam [acc x] (+ acc (* x x))) 0 xs)))
this definition represents a "fusion" of the separate foldl
and map
of the previous definition.
allow the programmer to write arbitrary pipelines of list-processing operations, and automatically combine them into (ideally) single foldl
s, which can be incrementally interpreted across large datasets.
(working name, subject to revision, open to suggestions)
let's revisit mean
:
(defn mean
(lam [xs]
(/ (foldl + 0 xs)
(foldl (lam [acc x] (+ acc 1)) 0 xs))))
how would we interpret this, incrementally, over a massive dataset?
in the above example, we really mean interpreting the body, in an environment where xs
is understood to not be a normal list (represented in memory) but an iterator-stream-incremental-thing, which we can pull one value out of at a time. we want to interpret:
(/ (foldl + 0 xs)
(foldl (lam [acc x] (+ acc 1)) 0 xs))
of note, we are performing two folds across the same list. do we need to interpret things separately and thus run the risk of duplicating calls to the outside world to provision the input values? it is likely out of scope for now, but we could perform automated detection of shared data, and interpret multiple subexpressions which consume the same data simultaneously - in this case we would interpret both foldl
s at the same time, feeding in each value of xs
to each.
let's leave the sharing optimization aside for now. if we are ok with duplicating the "fetching" work, we could take the following approach:
(note: this part is rough)
(foldl + 0 xs)
(foldl (lam [acc x] (+ acc 1)) 0 xs)
(/ var_0 var_1)
var_0
to a value (using the captured environment from (1))var_1
to a value (using the captured environment from (2))(/ var_0 var_1)
by chunking up the program into subexpressions which process lists, and interpreting those incrementally, we can then take their outputs and interpret the remaining program which depends on those values.
note that we do need fusion to interpret the above example of mean
. both foldl
s are in position to be interpreted incrementally.
however, the original sumSq
from above is not a "single fold" - it has an intermediate list which would cause us to OOM on a large dataset.
if we were to fuse sumSq
to the collapsed "single foldl" version above, we would then be able to interpret it using the approach from the previous section. it would only have 1 subexpression of interest, and no "remaining program" to interpret after that was finished.
note that we do not need fusion for the SSEI system to be usable for complex programs. however it will be up to the programmer to manually do the work of transforming programs into "single foldl"s (as in the case of sumSq
above). for small programs this may be fine, but for larger programs it may be quite tedious & error-prone.
for the SSEI system to work, all we would need is a program which is a closure/function, which receives a number of list input arguments, and performs "single foldl"s on them, reducing them to "atomic" values (i.e. non-lists).
kkawakam/rustyline#620 adds wasm support, and kkawakam/rustyline#604 maybe also does.
once one of them merges, update our code to use mainline rustyline
again, instead of the fork-dep introduced in #77.
right now the name (variable) parser only allows letters.
we probably want to support _
, -
, and numbers, at minimum.
we're now a bit dated in a number of packages.
Gen
, which I don't want to figure out rn)on branch drop-workspace
I have dropped the toplevel Cargo.toml
workspace, because of a conflict with rep_interchange's toplevel one (Cargo insists there can only be one).
we had previously used Cargo.nix
in the root of the repo, built from running cargo2nix
in that root dir.
now we are running with 3 separate packages in the respective subdirectories. we'll have to adapt (I think) to having 3 different Cargo.nix
files, generated from each respective Cargo.toml
in each subdir.
head
takes a list and returns the first element (or errors if the list is null).
tail
takes a list and returns the all but the first element (or errors if list is null).
this is less principled and less safe than adding case analysis (see #18), however it is way easier.
long overdue - this would allow us to compute a mean (average), for example.
example program:
(defn mean
(lam [xs]
(/ (foldl + 0 xs)
(foldl (lam [acc x] (+ acc 1)) 0 xs))))
(mean (list 1 2 3 4))
currently we use a mix of:
nix
to provision dev environmentnix
to build (via Cargo.nix
)for implementing #36, it would be significantly easier (and less likely to introduce future bugs/issues) if we ran CI based on the same, single nix
derivation set.
this perhaps means we can't use Github Actions - I'm not sure yet. setting up hydra
might be nice, tho is a larger task.
I want to note an issue with interfaces - if folks define their own parsers (via their own concrete syntax crate, an ability which we intend to support later) then we will need to define some invariants for the construction of abstract syntax trees. for one, avoiding reserved keywords. if we expose the Expr
type, then "invalid" names could be inserted into it.
one way around this would be to have a function fn check_expr(expr: &Expr) -> bool
which runs checks on whatever expr is passed in from a concrete syntax module, and asserts that it works with the runtime/abstract syntax machinery.
we have some limited tests that can be run via cargo test
. it would be nice to run those on PRs as a check before merging.
potentially linting via cargo fmt
as well.
I am partial to Nix for doing build-y stuff, but we could also go pure-cargo and that ought to work too.
nice to have:
cargo2nix
and check for a zero diff
cargo build
and check for a zero diffrelates to neighbour-hoods/social_sensemaker#14
akin to :l <filename>
in ghci (https://ghc.readthedocs.io/en/8.0.1/ghci.html#loading-source-files), this would help with loading files and tinkering with them interactively.
this would be made more-legit by having a module system.
migrating a todo from #40 to an issue here:
diagnose cause of duplicated entries in sto_vec and ideally remove them
these two match blocks create some of the extra garbage. I am not sure how to eliminate them at present, though.
they represent the notion that "interpretation of a thunk produces intermediate store allocations, including the allocation of the result of evaluation, which is itself stored in the store. thus at the end, we perform a copy of the result, setting the unevaluated thunk to an evaluated thunk which wraps the value.
the approach demonstrated in https://github.com/sacredcapital/rep_lang/tree/sto_cell_redirect seems quite viable and like it would close this issue.
N.B.: it's not actually clear to me we want to implement fix
(or that it is possible to do in Rust, at least without a significant rework of the term environment? but I'm not sure) because we may not want to allow arbitrary unbounded recursion.
$ cat ./examples/ex6.poly
(defn fac
(fix (lam [fact n]
(if (== n 0)
1
(* n (fact (- n 1)))))))
(fac 7)
$ cargo run --bin poly ./examples/ex6.poly
Finished dev [unoptimized + debuginfo] target(s) in 0.01s
Running `target/debug/poly ./examples/ex6.poly`
Scheme([], TCon("Int"))
Env({Name("fac"): Scheme([], TArr(TCon("Int"), TCon("Int")))})
## potentially relevant leads
https://github.com/andrejbauer/plzoo/blob/master/src/poly/interpret.ml#L78-L80
^ this uses a `rec` syntactic form which holds the name (which will be recursively bound) and the expr to which it will be bound. that seems easier than our approach, which just checks that the `Expr` enclosed in a `fix` call is a function type, and passes it to itself. since we don't know what name the inner function will bind itself to, it's difficult to do a fixpoint-y substitution on the interpreter environment. we could consider switching to the `rec` form.
lazy evaluation in Rust: https://doc.rust-lang.org/core/lazy/struct.Lazy.html
I think that the main "trick" we need is a bit of laziness to allow the knot to be tied. I'm not sure how to wire that into the interpreter guts.
thread 'main' has overflowed its stack
fatal runtime error: stack overflow
[1] 18277 abort (core dumped) cargo run --bin poly ./examples/ex6.poly
this is presumably because my very-literal transcription of the tutorial implementation was too literal and overlooked Haskell's laziness and Rust's eagerness of evaluation.
migrated from mhuesch/poly-rs#16.
fix
was removed in #26 because of #17. the issue there was that the strict interpreter entered an infinite recursive loop by expanding fix
to an expression which contained another fix
. because of strict evaluation, this would loop forever.
in the lazy interpreter, my hunch is that we can re-add fix
pretty much exactly how it was before, and things will work. this is because in lazy evaluation the recursive evaluation of the "inner" fix
won't occur until the recursive use of the fixpoint-bound function is forced. I think this means we're ok.
I am creating this to post updates on what I'm looking at or thinking about, in a way that's not super refined. so reader beware!
extensible records would be 🔥
we currently have product types, with Pair
(2-tuples).
the approach outlined in this paper might lead to a nice mapping between underlying Rust structs (other data in Holochain) and types in the poly-rs
language.
a possible sketch.
.
would be a the field accessor syntax, in keeping with Lisp-y style.
((lam [r]
(. r field_x))
{[field_x 1]
[field_y 2]
opt_expr_for_record_to_extend})
https://gist.github.com/nikita-volkov/6977841
https://www.reddit.com/r/haskell/comments/1oicmu/anonymous_records_a_solution_to_the_problems_of/ (<- some criticism of the approach ⬆️)
maybe kinda related?
https://gist.github.com/Icelandjack/6b9745d30cc57900b51269942c3cd504
migrated from mhuesch/poly-rs#14.
Integrate the reputation DSL with some REA event data coming from a Holo-REA module and make calculation execution workable
try to figure out what a good structure might look like.
relates to #6.
creating a separate issue for this as it seems distinct from #8, but is a prerequisite for it.
current poly-rs
sits over on my account. my suggested plan is:
once I complete the work on toplevel-api over on the other repo, to push the repository from there to this repository. e.g:
cd /tmp
git clone https://github.com/mhuesch/poly-rs
cd poly-rs
git remote rm origin
git remote add origin [email protected]:sacredcapital/ReputationDSL.git
git branch -M main
git push -u origin main
that would import all files and the whole git history.
then we can transfer (probably manually, since it doesn't seem possible to import issues across organizations) the issues from the poly-rs
repo over here.
what should we name the cargo crates for this repo?
I like short names so r_dsl
could be nice, but it is a total acronym. we would also want a separate name for the parser (as part of #8).
this is a little tricky.
two approaches come to mind:
infer
them on them, & discard not-well-typed ones.Expr
of any non-miniscule size.Type
first, and then use that to guide generation of an Expr
which fits the type.migrated from mhuesch/poly-rs#13.
we could perhaps take inspiration from OCaml's module system (https://dev.realworldocaml.org/first-class-modules.html).
we will likely need to parameterize the module system by the module identfier, so that the language can be used both:
Entry
s, which are identified using EntryHash
es)the latter use represents a kind of "agent-centric" language, where code is not stored in a "data centric" central repository with a global naming scheme. as such it might be kind of novel.
I'm just going to scope this to line/col numbers.
fancier error messages require significantly more thought and probably can be deferred until a while later.
I think match/case is necessary for implementing sum types.
how else can you do case analysis?
(match r
[ (L l_body) ((+ 1) l_body) ]
[ (R r_body) 7 ])
^ we might not even have to do any deeper pattern matching - 1 level only?
migrated from mhuesch/poly-rs#15.
as background:
it is always nice to have more tests.
this project could use more tests.
specifically for the toplevel API.
but to scope the issue and make it complete-able, here we'll focus just on arguments (to the toplevel) of shape "List Value
where the list is quite long".
since we currently store all lists fully in memory, this will potentially cause OOMs.
define a test which features a poly-rs program like so:
(defn sum (foldl + 0))
sum
(meaning that we have a definition and then a reference to it in the program body).
then feed a gigantic list in through input_data
in the toplevel API.
try to limit the Rust runtime heap, so that it will OOM.
this will give us a bar for failure; next we can work on passing the test by defining a proper iterator-based incremental list.
Rather than implementing OM itself in Rust (something akin to Dimensional in Haskell), a simpler (but much less performant) option may be to embed an RDF reasoner that leverages the OM RDF ontology directly.
See https://github.com/HajoRijgersberg/OM#libs for some implementations of OM unit conversions in other languages.
SUM()
of a set of valueshttps://en.wikipedia.org/wiki/Boolean_algebra#Basic_operations:
(: and (Bool -> Bool -> Bool))
(: or (Bool -> Bool -> Bool))
(: not (Bool -> Bool))
could add the secondary operations too, or just implement them in stdlib: https://en.wikipedia.org/wiki/Boolean_algebra#Secondary_operations
We want the parser to be separate from the runtime so that others can implement their own parsers.
our interpreter currently creates unnecessary closures.
this may or may not be a big deal.
I'd say it seems likely that other issues are more pressing than interpreter performance.
since this is semi-fresh in my mind, I'm recording my thoughts here in case we want to investigate this in the future.
currently if we have ((lam [x y] (+ x y)) 1 2)
, we will create 2 closures.
we'll interpret the first lambda to get a VClosure
.
then we'll interpret the application of the lambda, do a substitution with 1
for x
into the body, then get out another VClosure
.
then we will interpret that body with 2
substituted for y
, and be done.
so we will do 2 packaging-up-s of closures, and 2 substitutive interpretations.
in this case, no closure is actually needed to evaluate the arguments or the closure body (since 1
, 2
, and (+ x y)
have no free variables - x
& y
are bound by the lambda).
this seems like a waste, when we could just substitute both at once, and not package up either closure.
IIRC, this is a safe optimization, but I can't remember the name for it...
I think they key is determining whether the arguments to the function applications have any free variables.
and perhaps also determining what variables are returned...
generate well-typed Expr
s (see #21).
evaluate them to Values, using the unoptimized interpreter and the optimized interpreter.
assert that both versions produce the same output.
derive equality on Value
(possible??) in order to decide equality.
this should be doable given the current definition of Value
, which would be a strong form of equality (I forget all the names for different equality-s, but this is a very literal form, where we would ideally just derive Eq
).
after demonstrating equivalence, using 2 modules, replace the unoptimized module with the optimized one.
OR, hide the optimization behind some feature flag / optimization level.
migrated from mhuesch/poly-rs#12.
the Sto
introduced in #35 is our "heap" - where much of rep_lang
's allocated memory lives. much of this memory might go out of scope and not be needed, as evaluation of a program progresses.
in order to evaluate large programs, or ones where the amount of processed data exceeds machine RAM/swap (which is a required goal of the language), we will need to be able to collect garbage as we go.
#[derive(Clone)]
pub enum Value {
...
VList(Vec<Value>),
...
}
change to
#[derive(Clone)]
pub enum Value {
...
VList(Iterator<Item = Value>),
}
Value
(: < (Int -> Int -> Bool))
(: > (Int -> Int -> Bool))
optionally, <=
, >=
, although those can be implemented in stdlib using the above primitives, and then boolean primitives (see #44)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.