msp-strath / mary Goto Github PK

View Code? Open in Web Editor NEW

17.0 10.0 1.0 538 KB

Mary is the successor of Marx, a content delivery and assessment engine based on markdown and git

Haskell 78.78% Makefile 0.68% HTML 0.42% JavaScript 16.92% PHP 1.19% Emacs Lisp 1.61% CSS 0.39% Shell 0.01%

mary's Introduction

Mary

Mary is the successor of Marx, a content delivery and assessment engine based on markdown and git

mary's People

Contributors

Stargazers

Watchers

Forkers

kristleifur

mary's Issues

generate Shonkier terms from Pandoc divs and spans

We should be able to write templates for parts of documents in pandoc markdown. If you write something like

::: {mary="foo(x)"}
This is a paragraph which uses `x`{.mary}.
:::

that should amount to the definition of a shonkier function which computes a Block from an Inline.

Invoking this with

```{.mary}
foo("a template")
```

should amount to generating the paragraph block

This is a paragraph which uses a template.

Implementing this will involve building a ToTerm class whose instances specify how to turn values in various pandoc types into pandoc terms, splicing in fenced shonkier terms as you go.

Guards? Or what?

Before we can even think about guards, we should ask how Boolean values are to be represented. Is it [] and 'true? Is it 0 and 1? Is it altogether more wibbly-wobbly?

Nothing is stopping us doing p | g -> e or the like.

I guess my question is whether p -> (1=g)(e) should mean the same thing.

When should IncompletePattern requests be handled by offers of alternative matches? Fun design space.

No magic strings relating to specific deployments

The base URL for the deployment, used to expand relative links, should come either from a command line option ?> from an environment variable.

On the basis that it's just about ok to customize by editing index.php

add semicolon

We can implement semicolon as

semi(x,y) -> y

so that

semi(e1,e2)

gives us the effects of both and the value of the latter.

But it would just be nicer to write

e1; e2

It's straightforward to add a new Frame constructor for "left of ;".

We have the braces. It's time we had the braces' friend!

Explicit Environments

I started talking about explicit environments on #37 and on Slack. Let me be both broader and deeper in an issue of its own.

The idea is to introduce a crude but alarmingly effective form of what some might call "object orientation" if they wanted to try to wind us up: as a tactic, it wouldn't work on me, anyway.

Yer regular "mathematical" functions are, at least in a cbv language, an extreme form of contextualization. In f(a), f contextualizes a by waiting patiently for its value, and then postprocessing it. Meanwhile, back in dear old Pascal, one could write

with r do c;

where r is an expression of record type and c is a command for which r's fields are in scope. c could, of course, be return e, but it was a tad annoying that there was no version of with in the expression language, only in the command language. One of my main irritations (for it is superficial) with oo style, is all that projection all over the place like bad acne. I went through a period (ha ha) of pronouncing . as "spot" rather than "dot". IIRC, there's a version of this rant in my thesis. Anyhow, r contextualises c by allowing lookup as c runs, and quietly going away when c stops. It's pretty much the other extreme from functions, when it comes to what activities "contextualization" amounts to.

So I'm proposing an notion of first class environment e which allows us to write

e(a)

to contextualize the evaluation of a by the value bindings in e.

What is such an e? It's an association tree.

The empty environment is []
A singleton environment is ['var | value]
A binary environment is [env0 | env1] where env0's bindings may shadow env1's.

So, we're exploiting the fact that atoms apart from [] are the quotations of identifiers. (Indeed, perhaps it is now stretching a point to call [] an atom, given that it does not quote an identifier, you can't write it with a quote-mark, etc. But it is, at least, genuinely indivisible, which quoted identifiers wouldn't be if were to allow them to explode as strings.) We can tell the difference between a singleton and a binary by checking their heads. We can thus crunch such a structure down to a Haskell map or a JS object, when we need to use it as an actual environment.

Why is it a tree and not a list? It's a monoid, so why bother normalising it? Also, you get to preserve sharing when you build bigger environments from smaller ones. Bargain basement inheritance is gained by the expedient of shoving old environments at the back end of new ones.

In the it-looks-like-C-but-it-so-isn't brutality of this language, I am sorely sorely tempted to make the notation

x = s

sugar for

['x | s]

so that we can (never let that mean assignment and also) write things like

[x = 5  y = 7  [z = 3]]

and yer common or garden

let x = s in t

becomes

(x = s)(t)

Around the corner, there's the option to consider pattern matching as "compute an explicit environment or abort", where our beloved -> is also concealing yet another use of application-as-contextualization.

It's cheap; it's fun; it's encapsulation for hooligans. It's the polite version of dynamic binding, just as we're already doing the polite version of delimited control.

Shall we do this?

infix arithmetic/logical operators

We could do with some basic infix gadgetry for arithmetic, numerical comparison, and Boolean logic. I'm struggling with my conscience as to the representation of Boolean values. Do we overload 0 and 1? Do we do the Lisp classic [] and 't? The former, I think.

This is mostly a parser hacking job.

Interesting questions for the future involve whether we let the punters define infix things. For the moment, we just hardwire enough stuff to build the sorts of formulae we use in marking.

Add newtype wrapper for Atoms

This way we can define a lot of typeclasses targetting atoms directly.
Note that providing an IsString instance will allow us to keep all of
our literals thanks to overloaded strings.

How to store user data: discuss

Sometime soon, we'll need to make suitable arrangements for storing various forms of persistent data. Most of it is form data, per user per page. Some of it is configuration data, per page. Of course, there are also administrative data (information about cohorts, group assignments, etc).

I'll be the first to confess that the way I handled this in Marx was fairly ghastly. The per user per page data just lived in log files recording the time of each visit and which key-value pairs got created/mutated by that visit. The only things logged were form fields: generated content could be served to the browser but not logged. Log files were written by creating a temporary file, then cp-ing over the old log file. There was no serious attempt to deal with data races. For students, with everybody editing their own data, races aren't a big deal. Data races are more of an issue for config data being edited by staff, or people sharing the marking jointly editing the macros for common comments.

Also, it's important to be able to query these data sources with friendlier tools than grep.

Repository Management

I'm now thinking about how to install Mary actively.

The plan is that people build sites on GitLab. The question is how to ensure that their changes to those sites get propagated.

For the moment, I'm expecting that there will be clones of all the sites living in my filespace (with a keep out apache configuration). The question is how to decide when to git pull. Options include

cron job (which is the Marx approach; Marx knows nothing about it)
make Mary git pull the repo every time it is accessed (might be slow)
make Mary git pull the repo if this hasn't happened in the last n minutes
allow an option in get data to demand a git pull (especially useful if you're the author; perhaps permitted only if you have authorisation)

At the moment, I'm minded to implement 4 first (so anybody can demand a pull), then try supplementing it with 3.

What I don't know is whether GitLab continuous integration can do anything useful for us.

Port the interpreter to JavaScript

The rock bottom way to get our code running client-side is to port the interpreter to JavaScript and then render parsed code as whatever our JS representation of terms might be.

It's not clear how much mechanical assistance we can get with this port (which we have to keep in sync with the Haskell version), but at the moment, it's not such a big job.

Continuations Scoping Effects?

At the moment, when we write a handler

handle(x,{'foo(y) -> k}) -> handle(f(x,y), k(e))

we evaluate e and feed its value to k via use. Any effects (e.g., 'foo) done by e are handled by handle and the context in which it runs, not by k.

But we could just as easily push k's frames onto the stack before evaling e. This behaviour can already be simulated by making 'foo(y) return a thunk which k forces, but it strikes me as unpleasant to demand that, and counterintuitive that k(e) is not fully contextualizing the evaluation of e.

I'll fork a k-eval branch in which we can try this out, by contrast with the current k-use behaviour.

I'm optimistic. It seems like something that is missing from Frank. You ought to be able to make deals like "you're only allowed to ask the network for stuff if you can handle the network going down".

Add a javascript test suite

Expose the javascript compiler
Find the appropriate command to run js locally
Add a test suite based on the runner in Test.Utils

Bonus:

Port the javascript renderer to Haskell
Make sure the test suite produces the same output for js & Haskell

boolean equality

I'm wondering what == should do.

Here's my first guess.

cells are equal componentwise
nils are equal
atoms are equal if the have the same name
numbers are equal in the usual way
literal strings are equal if they have the same characters in the same order
testing equality of anything higher order triggers 'HigherOrderEquality

Brace Deep Sections

We should be able to construct abstractions on the fly by writing _ in expressions which get abstracted in left-to-right order at theit closest enclosing curly braces. E.g.

{"w" == _}

is the function which tests equality with "w".

{_+3*_}(2,5) -> 17

It's a matter of turning the braces into extra args in a lambda.

{x -> f(x,_)}

means

{y,x -> f(x,y)}

It's a valuable source of brevity.

Add js renderer for clauses & VFun

We are currently missing a renderer for clauses & VFun.

Cf. anonymous.jsgold vs anonymous.gold

mary repl

Should

mary repl foo.shonkier

perhaps load the definitions from foo.shonkier, then let you muck about with evaluating expressions?

Extended patterns

Gathering here the proposal to extend the pattern-matching language
(for rational patterns, see #43) as well as interesting examples using
the introduced features.

intersection patterns p@q matches if p matches and q matches (right-to-left shadowing)
union patterns p+q matches if either p matches or if it does not and q matches (usually using | but here it would clash with Cell...)

zip([[]|xs] + [xs|[]]) -> xs
zip([[x|xs] | [y|ys]]) -> [[x|y] | zip([xs|ys])

This example is a bit artificial and requires the function to be uncurried... :/

negation pattern !p matches if p does not match (no bindings)

flatten([])                  -> []
flatten([x@!([]+[_|_]) |xs]) -> [x|flatten(xs)]
flatten([x|xs])              -> append(x,flatten(xs))

Support comments in the parser

Both single line and multiline comments please! And they should nest properly!

Bike-shedding about the delimiters is welcome. By default I would go for
Haskell's but I don't really care tbh.

Fix rendering of VFun in Shonkier.Pretty

At the moment we dump the LocalEnv' a and print the raw clauses. This is
of course buggy, cf. anonymous.gold where the result of evaluating:

{ x -> { y -> x }}('hi)

is printed as

{y -> x}

instead of

{y -> 'hi}

We can:

either print an explicit substitution (adding let-bindings to the
language would even allow us to make these valid syntax)
or perform the substitution before printing the clauses

Remove separators between clauses in curly braces?

One of the metarules in the design of Mary's syntax is that there is no composition by mere juxtaposition. This has to be the case if [e1 e2] is to be recognizably a list of length 2. As a result, we can be super-loose about layout.

append([], ys) -> ys   append([x | xs], ys) -> [x | append(xs, ys)]

is just fine all on one line. Indeed, we make no distinction between newlines and other whitespace.

So now I'm asking myself why anonymous functions in curly braces separate clauses with |. There is no ambiguity in

{p11,...,p1n -> e1   p21,...,p2n -> e2   ...   pm2,...pmn -> em}

with only whitespace separation.

I propose that we switch to whitespace separation. Any objections (technical or moral)?

Patterns for strings

If we are going to want to write parsers in Shonkier, we are going to need
to have patterns for strings. My first instinct was to reuse list syntax and
write:

['a'|str] to match strings with a prefix "a"
[c'''c|str] to match strings with a prefix "'" (c can be any identifier)
[c d e|str] to match strings of length at list 3

However this makes it impossible to distinguish between a pattern for a list
of string and one for a string. This breaks examples such as enum where
I used [_|_] to detect whether something was a (non-empty) list or whether
I could assume it was a string.

But maybe such examples should be forbidden? Thoughts?

Serialization of Values, Especially Environments

We're very close to being able to serialize values, as we can now prettyprint functions. The main thing missing is that we can't yet parse or prettyprint frames. Not that there's terribly much complexity involved in doing that. Given that environments are first class values mapping identifiers to values, it would be kind of awesome to be able to just save and load them from textfiles.

At some point, we'll need to do exactly that, to achieve cross-session persistence, configuration information and all that business.

Let us suppose we have some notion of .rho file representing a serialized environment. Perhaps we might be able to invoke

mary shonkier --env=foo.rho bar.shonkier

which prettyprints the value of

foo; main()

where foo is the environment given by deserializing foo.rho.

For simple queries, we might also want to be able to invoke

mary shonkier --env=foo.rho --main="<term>"

Now, exactly what should be the programmer's interface to serialization, I'm not sure. We have lots of options...

But if we have the underlying ability, we can conjure.

That said, we may in time prefer not to spend energy prettyprinting and parsing files which are not ultimately for human consumption. We may prefer a different format for pickling and unpickling (did someone ask how to preserve sharing?), but that's a problem to solve when we have it.

disambiguating parentheses are ambiguous

One of the abiding design principles of our syntax is that the juxtaposition of two valid expressions is never a valid expression. That's how come we can write lists like

[e1 e2 e3]

Except that I broke this principle horrendously by allowing parentheses for disambiguation.

[(e1) (e2) (e3)]

is currently a one element list computed by iterated application.

With infix operators, that's bound to happen.

What does

foo (x + y) * z

mean?

The easiest fix is to change the principle to "no two valid expressions separated by whitespace ever constitute a valid expression", while insisting that application allows no whitespace between the function and the opening paren of its arguments. i.e.

e1 (e2)

is two things, but

e1(e2)

is one.

But perhaps there are other ways out of this rat sack.

Relative Links and Pubs

Now that we can serve pages, we need some basic machinery that isn't especially Shonky-related. We need to build sites where there are multiple pages connected by relative links. Pandoc's Inline type has a constructor for Link, so we need to build a filter which detects which of those links are relative and transforms those which are into links with a different ?page.

Meanwhile, a thing Fred did for Marx which is well worth pinching is that any file in a site under a directory named pub is allowed to be served. The trouble is that Mary repos live in a private filespace, so you can't just give those files a direct url. The trick is to use a wee bit of php to check whether the file is in a pub and if so, fetch and serve it. Probably, we just extend index.php so that the usual Mary url is the contact point, but with a ?pub in $_GET, instead of a ?page. We could then forward the request to Mary with a suitable new command line option which checks the path for the presence of pub and responds accordingly.

Again, relative Link and Image data would need to be mangled appropriately by mary pandoc.

Extend github actions to build the project with various ghc versions

At the moment we are only using 8.6.5 because this is what I have and I knew
the build would work. It would be nice to expand the range of supported versions.

Add more tests for import

Import of f & implicit import of f (via transitive dependency) should not raise an "AmbiguousName" error when using f.
Diamond imports (A -> B, A -> C, B -> D, C -> D) with conflicting definitions
but not used in the main (=> should work just fine)
Diamond imports & use conflicting definition (=> should raise "AmbiguousName")
Imports in a mary page

Literal strings should be `Text`, not `String`

Ironically, Attoparsec makes it much easier to find the closing delimiter of a string literal if you want to extract the list of characters in between, in spite of the existence of Data.Text.breakOn.

Even so, given that Pandoc trades with us in text, and our main use of string literals will be feeding text to Pandoc, we should minimize currency exchange.

Create Main function which can act as a Pandoc filter

We should be able to invoke pandoc with the filter

mary -pandoc

written as document by this online tutorial.

Text.Pandoc.Definition gives the structure of Pandoc documents; Text.Pandoc.Walk gives tools for traversing and querying them.

In particular, we should be able to find all the code blocks with attribute mary-def, accumulate their enclosed text (separated by newlines), and replace them will null blocks. The accumulated text can then be parsed as a bunch of definitions.

With that done, it should be possible to find all code blocks and inline code with the attribute mary, and treat them as expressions to be evaluated, yielding values which get rendered as Pandoc blocks or inlines, replacing the code.

Correspondingly, we should be able to make Mary process markdown files containing embedded code which computes content.

Rat Pats?

Would we like patterns for rational numbers? If so, what are they? We can currently test whether a value is a particular constant. Of course, we can do a bunch of stuff with right-programming. We could leftify that by adding guards. But is there any value to making the pattern language more expressive? (Haskell's n+k patterns are a data point in this space, but they were always perplexing.)

We could consider

A pattern to test if a value is a number.
Patterns which enforce an interval of acceptable matches (e.g., being at least 0).
Patterns which enforce being not only rational but integral.
Patterns which invert affine functions of one variable.

It's a bit late for an April Fool, but I think it would be hilarious to make

foo(x + y, x - y) -> [x y]

mean the same as

foo(a, b) = [(a + b)/2 (a - b)/2]

We should be guided by power-to-weight ratio, and by readability concerns.

I guess we should do infix first.

Add namespace contextualisation for whole expressions

At the moment we can only write List.filter p (List.map f xs) but it would
be nice to be able to write List.(filter p (map f xs)) and have it mean the
same thing (given that none of p, f, and xs are in namespace List).

I wanted to do that but somehow forgot about it.

Form elements

Pandoc does allow us to write form elements, but only in that we're allowed to write things like

`<input type=text name="foo">`{=html}

`<input type=submit>`{=html}

I think we should write less and achieve more. We need some way to signal the existence of form fields to the shonkier runtime, so that field value requests can be handled. We should also be able to attach a preprocessor to a form field which eats text and emits a value or aborts; field value requests should yield the output of the preprocessor.
(One day, when our language is typed, the preprocessor will be type directed.)

We could have something like

`'foo`{.in type=text size=12}

to signal a form field bound to the atom 'foo. To access this field, invoke

'field('foo)

We might, further, write

`'bar <- parser`{.in type=text size=100}

where parser is an expression yielding a function which eats string literals.

Update the emacs mode to support splices

I think this may be getting too complex for emacs to handle. We could look
into the way Agda does this (we only need to run the parser).

FromValue should be allowed to fail

As discussed on slack, it would be nice to change the class FromValue so
that we may return Maybe t rather than t. This would for instance allow
us to fail hard in .mary test cases when the generated pandoc value is invalid.

Replace circular programming in the top level `Env` by explicit recursion

At the moment, we use Oxfordesque circular programming to construct an environment of recursive definitions, all in scope for each other. Although code runs ok, the resulting circular values are unserialisable. We could, instead, maintain a global environment of top level stuff, which would then not need to be stored in closures, or use some other explicit mapping from recursively defined names to their meanings.