misalcedo / tortuga Goto Github PK

A CGI and WCGI server for HTTP/1.1

License: Apache License 2.0

Rust 99.69% Dockerfile 0.31%

tortuga's Introduction

📢 About Me

Hi, my name is Miguel Salcedo. I am currently working as a Software Developer at . Before that, I worked at for close to 6 years.

🎮 Hobbies

I love to program on my latest idea, watch anime and play video games. Although my 👶 daughter monopolizes most of my time during the day, I still often spend my nights in front of a 💻 or 📺 screen. I start most of my mornings with a cup of ☕ and will have 1 or 2 more before the day is done.

⚡ Fun fact: I don't remember the last time I read a book that was not software-related. Though, I do know it was a book in the Witcher series.
⚡ Fun fact: In the order I learned them, I speak: Español, English and some 日本語.

📋 My Projects

Tortuga 🐢 - An HTTP server implementing the CGI protocol for system processes and WASI module instances.
PPP 💻 - An HAProxy Protocol Parser written in Rust. Parses versions 1 and 2 of the protocol.
WASM-AST 🕸️ - A WebAssembly Abstract Syntax Tree. Meant to ease WebAssembly code-generation for compilers written in Rust.
Literate-rs 📔 - A Literate Programming tool that extracts the contents of fenced code blocks from Markdown files.

tortuga's People

Contributors

Stargazers

Watchers

tortuga's Issues

WASM Interface types

Implement support for interface types to have the host and guests communicate.

https://github.com/WebAssembly/interface-types/blob/master/proposals/interface-types/Explainer.md

Extend AST

The goal is to mimic erlang processes here with a process hierarchy.
exit criteria is being able to implement a simple ping pong that toggles between the numbers 1 and -1.
Also add support for negative numbers for the whole part.

processes should be able to spawn any number of children, but also be able to define children to spawn at the start of its lifecycle (before any messages are sent in the WASM start function).

Define a new version of the grammar to remove the need for `@IDENTIFIER`s

The following are the changes to the grammar:

optimize wasm output

rely on https://github.com/WebAssembly/binaryen to improve emitted wasm binary

Update book

Rename intro section.
Add getting started section.

See https://doc.rust-lang.org/book/ch00-00-introduction.html

point readme to book to dedupe
Add commands section https://rust-lang.github.io/mdBook/cli/index.html
Move updating the book to its own action on pushes to the docs directory

Update workflows

Update lint and release workflows to use cargo cache and actions-rs actions.

Implement async LEB128

Implement my own LEB128 writing using the C-like psuedo-code here: https://en.wikipedia.org/wiki/LEB128
Since I never read the values myself (done by wasmtime), I do not need the read functionality and can remove a fairly simple dependency.

Encode unsigned integer

do {
  byte = low-order 7 bits of value;
  value >>= 7;
  if (value != 0) /* more bytes to come */
    set high-order bit of byte;
  emit byte;
} while (value != 0);

Encode signed integer

more = 1;
negative = (value < 0);

/* the size in bits of the variable value, e.g., 64 if value's type is int64_t */
size = no. of bits in signed integer;

while (more) {
  byte = low-order 7 bits of value;
  value >>= 7;
  /* the following is only necessary if the implementation of >>= uses a
     logical shift rather than an arithmetic shift for a signed left operand */
  if (negative)
    value |= (~0 << (size - 7)); /* sign extend */

  /* sign bit of byte is second high-order bit (0x40) */
  if ((value == 0 && sign bit of byte is clear) || (value == -1 && sign bit of byte is set))
    more = 0;
  else
    set high-order bit of byte;
  emit byte;
}

Implement a runtime

Create a runtime for tortuga.

The runtime will expose an HTTP 2/3 interface and handle requests by creating a new module instance.

Modules are mapped to a uri and method pair. The headers, http version, and method are not passed to the actor. Only the request body and uri is passed.

Add a subcommand called start that runs the server.

Read CLI banner information from Cargo

Create a build script to output cargo information as singleton instance of a struct object.
The main file will load the included module to read all of the cli infomation.

Implement AST (Abstract Syntax Tree)

Actor

Name - An URI used to reference this actor.
Signature - The bit-string interpretation of the incoming message.
Messages are verified for expected length. No other verification is performed.
Any serialization happens at the system level.
Intent - The list of expressions to perform in response to a given message.
Children - Dependent actors that must be initialized in order to process a message.

Implementation Notes

Use YAML to encode AST in order to experiment with different syntaxes without having to design a new language lexer and parser each time.
Start with single intent per actor.
Only support a single message interpretation per actor.
Read actor dependencies into a Directed Acyclic Graph (DAG).
Actor paths are URIs. Not yet sure whether to make them relative paths or not yet, but must be a valid relative or absolute URI. Similar to Go in that relative paths are expected to be found only in the current project.
Define intent as a list of instructions with fields that define the instruction arguments (e.g., Send, Multiply, Negate, etc.)
Types for signature are number, bit vector. The signature assigns names to each portion of the message to be referenced in the intent.
Children maps a set of names to a path. Currently, there is a requirement that a path can only be mapped to a name once. This is to discourage implementing load balancing as part of an actor.

Create releases with binaries

Follow the example here.

https://github.com/rust-lang/mdBook/blob/master/.github/workflows/deploy.yml

Basically, we can trigger a workflow on a new release being created to build for each target and add the executables.

Ideally, could have a workflow that adds a tag with the current version if no release exists for that version. That way the tags are added automatically.

Then I can trigger releasing to crates.io and docker by manually creating a release for the auto-generated tag.

Implement MVP

Follow crafting interpreters to create a minimal language

Prevent simple infinite recursion in the interpreter.

The tree-walking interpreter currently allows infinite recursion in the form of:

@x = x

That code would cause the interpreter to panic with a stack overflow error.

The goal is to prevent recursion for functions with no arguments (i.e., nullary functions).

See https://en.wikipedia.org/wiki/Arity

Also n-ary functions with nullary function parameters must not recurse.

Lastly, should verify if we should recursively check for infinite recursion. In other words: should an n-ary function with n-ary function parameters that are nullary recurse (and so on recursively, how many levels deep should we check)?

Error Handling

Expand error messages to be pretty printed and have more variants.
Rust and PEST both print error messages with the line provided and the lexeme underlined.

Async

Add async support in order to compile each file in its own task.

Look to using tokio and futures for async runtime.
As much as possible, we should avoid depending on any non-std imports that are cross-cutting. Where necessary, we should depend on types in futures instead of tokio to make it easier to switch runtimes.

Expand the grammar to match on natural numbers

Add a pattern to only allow natural numbers instead of real numbers.

Some example patterns:

natural = name ".0" ;
natural = name % 1 = 0 ;

Still an open question what the pattern should be.

Create a Markdown book for documentation

Use https://rust-lang.github.io/mdBook/ to create a book from the files in the docs directory.

Forward tortuga.salcedo.cc to the book

CLI artifacts as releases

On a new release, add the CLI executable as artifacts to the repo

Add panic mode to the parser

Allow the parser to identify multiple errors in an input file by synchronizing upon encountering an error. Synchronization means attempting to parse an expression, then skipping a token each time that does not work.

Most helpful with a backtracking parser as we can guarantee only one token is skipped on each synchronization pass.

Semantic analysis

No matching pattern

Infer return types of functions and validate that at least one pattern matches.

Function used only if in scope

Verify a function name is only referenced if the function has been defined.

Range endpoints in increasing order

For endpoints x, y x <= y.

Limit patterns to const arithmetic.

An arithmetic expression can be used within a pattern if it meets both of the following two conditions:
* It uses only numeric or bitwise operators.
* Its value can be evaluated to a constant when complied.

Fix windows releases

Windows releases currently upload a tar gz. Need to figure out why and switch it to a 7z.

define transformer to wasm

Take a process and generate WASM to be executed by a runtime.

Update documentation

Document the base types of the language: natural & real numbers, intervals, tuples, and functions.
Update the examples with the new grammar.

Implement Lexer as iterator

The lexer returns an iterator that tracks the current seek into the file.

Consider BEAM over WASM as the target for tortuga

BEAM is the Erlang Virtual Machine. Though WASM is well documented, BEAM is less so.

Some documentation does exist:

Create a playground website

compiles the interpreter to WASM and run that in the browser

Similar to Rust Playground but entirely self-contained in a static site

Fix released binaries

Linux release fails on missing glibc.
Also target/release should be removed from the tarball.

Create a C-based WASM example

In order for the runtime to be multi-language I need to define an interface that works well with any WASM language. For now, the best option is C. Though, Rust and Assembly script would be good candidates.

Expand grammar to support tuples

Creating a tuple looks like this:

tuple  = "{" members? "}" ;
values = expression ( "," expression )* ;

To match a tuple as a pattern:

explode = "{" members? "}" ;
members = pattern ( "," pattern )* ;

We also need to expand assignment to be:

assignment = pattern "=" block ;

To do that we need to change how we match identifiers in a pattern.
This is still an open question.

Benchmark

Add a Fibonacci number benchmark similar to the one in Lox.

@fibonacci(@n <= 1) = n
@fibonacci(@n) = fibonacci(n - 2) + fibonacci(n - 1)

@loop(@i = 0) = fibonacci(I)
@loop(@i) = [
  fibonacci(i)
  loop(i)
]

Use existing crate for web assembly types

See https://github.com/paritytech/parity-wasm
Could extract my code into its own crate or use an existing one.

Make wasm syntax easier to work with

Currently the syntax is verbose and performs no validation on the cross-references between types.

One way is by adding constructor methods to each module sub type on module that's returns an index type. The index type implements into usize when paired on a tuple with a module.

Expand grammar to support byte strings

Bytes strings can be their own grammar or an extension on tuples. For now, I think making them built on top of tuples with some bit-specific patterns would work best.

Grammar:

Bytes         = { "\"" ~ Byte* ~ "\"" }
Byte          = _{ "0" | DecimalByte | BinaryByte | CodePointByte | HexByte }
HexByte       = @{ "16#"? ~ ASCII_HEX_DIGIT{2} }
DecimalByte   = @{ "10#" ~ ASCII_NONZERO_DIGIT ~ ASCII_DIGIT{0, 2} }
BinaryByte    = @{ "2#" ~ ("0" | "1" ){8} }
CodePointByte = @{ "'" ~ CodePoint ~ "'" }
CodePoint     = { "\\n" | "\\r" | "\\t" | "\\0" | ANY }

Example:

bytes = "FF '"' 10#32 '\' 10#256 2#01001000 ' ' AA '\n' ''' '\0'"
;"FF x:4 y:4 | rest"

Add counter example from book

Remove `Value::Unit`

Switch overloaded operators to return a Result<Value, RuntimeError> instead of Value::Unit. This allows better error messaging in the REPL.

Will require a new error about being unable to perform an operation on 2 values of some types.

Update docker tags

Don't update latest on PRs only on push to main.

Use a build of Tortuga with version command to get the proper tag.

Could make existing workflow only run on push to main and add docker steps to build.yml

WASM Module Linking

Rely on the module linking proposal for spawning new children.

https://github.com/webassembly/module-linking

Switch allocator to mimalloc

The default allocator in musl is slow under concurrency. mimalloc is much faster.

Add support for Lists

Lists are a synonymous with recursion in many functional languages.
Lists are typically defined by the following grammar:

Empty  = ɛ ;
List   = Empty | Member List ;
Member = ANY ;

In Tortuga, tuples allow us to define lists as:

Empty  = "{" "}" ;
List   = Empty | "{" Member "," List "}" ;
Member = Tuple | Number | Function | Interval ;

That means lists are nothing more than syntactic sugar on nested tuples limited to a cardinality of 2 (first and rest).

The goal of this issue is to define syntactic sugar for defining and deconstructing list-like tuples. The benefit is that the new grammar would enable us to optimize list-like tuples in the future to minimize space usage.

Empty  = "{" "}" ;
List   = Empty | "{" Member | List "}" ;
Member = Tuple | Number | Function | Interval ;

Reduce the use of generics in functions to speed compilation

Currently, the compiler depends on AsyncRead, Unpin, and Debug in order to pass the code to be compiled to the lexer.
For emitting the compiler depends on AsyncWrite, Unpin, ?Sized, and Debug to pass around the output.

Currently I am using the async types from futures, but I would like to simplify this to using a library type or trait that encompasses the same. My concern is that as it stands switching between async libraries for reading and writing as well as sync vs. async io is a huge pain.

Best to start with the read path as that is the least used.

Improve the REPL

Improve code completion, and formatting of the REPL.
https://crates.io/crates/crossterm

Module Exports

Add exports to a program.

The exports is the return value of a program.
Programs that have a list of expressions (not continuations for the REPL) must end with a tuple that exports functions.

Example:

a(n) = n^2
a(x, y) = x*y

b(n) = 0 - n
b(x, y) = y - x

{|a| = 2, |b| = 2}

Docker image

Add a docker image to build tortuga programs.
Need version cmd for image tags.

See https://github.com/misalcedo/kv-store/blob/main/.github/workflows/build.yml

Missing tests for import section

Need an import validation

Define a Finite State Machine for the grammar

Implement the parser as a Finite State Machine that transitions on tokens and computes some context based on the transition (from state, to state, input).

This will allow us to define an LR(1) parser that is hand-written instead of relying on parser generators.

Proposal

The idea I have is to define a non-deterministic Finite State Machine by defining the sequence of tokens that comprise a grammar rule. The parser builder, then takes each of these rules together with their sequences and builds a NFSM that stores possible current states and the current sequence of tokens.

The parser drives the NFSM by passing it tokens until the EOF. The NFSM returns a Result<Option<Rule>, Error>. Attempting to transition after after EOF is an error.
The NFSM returns Ok(Some(Rule)) when a rule completes and Ok(None)) when in a non-terminal state. To implement panic mode, define a synchronization rule in the NFSM and on an error the parser gets a synchronization iterator. The iterator will discard one of its tokens until the sync rule matches and the NFSM is in a non-terminal state (consumes as many of its token buffer as possible).

Logging

Add logging to the crate.

Functions should lose their binding

Currently the binding makes it so a function cannot reference any functions defined after it's declaration. That's why I needed to override the function to support factorial. Only the old factorial definition existed.

A better implementation is to do multiple passes to determine the names at leach scope level.

Test buffers vs. counting emitter

Benchmark the differences between writing to a buffer, then copying, vs. counting then writing again for emitting to binary.

Expand the grammar

The next step is to expand the grammar to include:

variable constraints
functions
statements
chained comparison (e.g. 1 < 2 < 7)

Also, will need to expand the interpreter to have state so that constrained variables can be referenced in new entries (allowing for multiline programs to be split into multiple entries). The interpreter should not allow overrides to match compiled behavior.

Syntax highlight

Implement syntax highlighting for code samples shown in the book.