Giter VIP home page Giter VIP logo

tortuga's Introduction

📢 About Me

Hi, my name is Miguel Salcedo. I am currently working as a Software Developer at GitHub. Before that, I worked at Amazon for close to 6 years.

🎮 Hobbies

I love to program on my latest idea, watch anime and play video games. Although my 👶 daughter monopolizes most of my time during the day, I still often spend my nights in front of a 💻 or 📺 screen. I start most of my mornings with a cup of ☕ and will have 1 or 2 more before the day is done.

  • ⚡ Fun fact: I don't remember the last time I read a book that was not software-related. Though, I do know it was a book in the Witcher series.
  • ⚡ Fun fact: In the order I learned them, I speak: EspañolEnglish and some 日本語.

📋 My Projects

tortuga's People

Contributors

dependabot[bot] avatar misalcedo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

tortuga's Issues

Extend AST

The goal is to mimic erlang processes here with a process hierarchy.
exit criteria is being able to implement a simple ping pong that toggles between the numbers 1 and -1.
Also add support for negative numbers for the whole part.

processes should be able to spawn any number of children, but also be able to define children to spawn at the start of its lifecycle (before any messages are sent in the WASM start function).

Update workflows

Update lint and release workflows to use cargo cache and actions-rs actions.

Implement async LEB128

Implement my own LEB128 writing using the C-like psuedo-code here: https://en.wikipedia.org/wiki/LEB128
Since I never read the values myself (done by wasmtime), I do not need the read functionality and can remove a fairly simple dependency.

Encode unsigned integer

do {
  byte = low-order 7 bits of value;
  value >>= 7;
  if (value != 0) /* more bytes to come */
    set high-order bit of byte;
  emit byte;
} while (value != 0);

Encode signed integer

more = 1;
negative = (value < 0);

/* the size in bits of the variable value, e.g., 64 if value's type is int64_t */
size = no. of bits in signed integer;

while (more) {
  byte = low-order 7 bits of value;
  value >>= 7;
  /* the following is only necessary if the implementation of >>= uses a
     logical shift rather than an arithmetic shift for a signed left operand */
  if (negative)
    value |= (~0 << (size - 7)); /* sign extend */

  /* sign bit of byte is second high-order bit (0x40) */
  if ((value == 0 && sign bit of byte is clear) || (value == -1 && sign bit of byte is set))
    more = 0;
  else
    set high-order bit of byte;
  emit byte;
}

Implement a runtime

Create a runtime for tortuga.

The runtime will expose an HTTP 2/3 interface and handle requests by creating a new module instance.

Modules are mapped to a uri and method pair. The headers, http version, and method are not passed to the actor. Only the request body and uri is passed.

Add a subcommand called start that runs the server.

Read CLI banner information from Cargo

Create a build script to output cargo information as singleton instance of a struct object.
The main file will load the included module to read all of the cli infomation.

Implement AST (Abstract Syntax Tree)

Actor

  • Name - An URI used to reference this actor.
  • Signature - The bit-string interpretation of the incoming message.
    Messages are verified for expected length. No other verification is performed.
    Any serialization happens at the system level.
  • Intent - The list of expressions to perform in response to a given message.
  • Children - Dependent actors that must be initialized in order to process a message.

Implementation Notes

  • Use YAML to encode AST in order to experiment with different syntaxes without having to design a new language lexer and parser each time.
  • Start with single intent per actor.
  • Only support a single message interpretation per actor.
  • Read actor dependencies into a Directed Acyclic Graph (DAG).
  • Actor paths are URIs. Not yet sure whether to make them relative paths or not yet, but must be a valid relative or absolute URI. Similar to Go in that relative paths are expected to be found only in the current project.
  • Define intent as a list of instructions with fields that define the instruction arguments (e.g., Send, Multiply, Negate, etc.)
  • Types for signature are number, bit vector. The signature assigns names to each portion of the message to be referenced in the intent.
  • Children maps a set of names to a path. Currently, there is a requirement that a path can only be mapped to a name once. This is to discourage implementing load balancing as part of an actor.

Create releases with binaries

Follow the example here.

https://github.com/rust-lang/mdBook/blob/master/.github/workflows/deploy.yml

Basically, we can trigger a workflow on a new release being created to build for each target and add the executables.

Ideally, could have a workflow that adds a tag with the current version if no release exists for that version. That way the tags are added automatically.

Then I can trigger releasing to crates.io and docker by manually creating a release for the auto-generated tag.

Implement MVP

Follow crafting interpreters to create a minimal language

Prevent simple infinite recursion in the interpreter.

The tree-walking interpreter currently allows infinite recursion in the form of:

@x = x

That code would cause the interpreter to panic with a stack overflow error.

The goal is to prevent recursion for functions with no arguments (i.e., nullary functions).

See https://en.wikipedia.org/wiki/Arity

Also n-ary functions with nullary function parameters must not recurse.

Lastly, should verify if we should recursively check for infinite recursion. In other words: should an n-ary function with n-ary function parameters that are nullary recurse (and so on recursively, how many levels deep should we check)?

Error Handling

Expand error messages to be pretty printed and have more variants.
Rust and PEST both print error messages with the line provided and the lexeme underlined.

Async

Add async support in order to compile each file in its own task.

Look to using tokio and futures for async runtime.
As much as possible, we should avoid depending on any non-std imports that are cross-cutting. Where necessary, we should depend on types in futures instead of tokio to make it easier to switch runtimes.

Expand the grammar to match on natural numbers

Add a pattern to only allow natural numbers instead of real numbers.

Some example patterns:

natural = name ".0" ;
natural = name % 1 = 0 ;

Still an open question what the pattern should be.

Add panic mode to the parser

Allow the parser to identify multiple errors in an input file by synchronizing upon encountering an error. Synchronization means attempting to parse an expression, then skipping a token each time that does not work.

Most helpful with a backtracking parser as we can guarantee only one token is skipped on each synchronization pass.

Semantic analysis

No matching pattern

Infer return types of functions and validate that at least one pattern matches.

Function used only if in scope

Verify a function name is only referenced if the function has been defined.

Range endpoints in increasing order

For endpoints x, y x <= y.

Limit patterns to const arithmetic.

An arithmetic expression can be used within a pattern if it meets both of the following two conditions:
* It uses only numeric or bitwise operators.
* Its value can be evaluated to a constant when complied.

Fix windows releases

Windows releases currently upload a tar gz. Need to figure out why and switch it to a 7z.

Update documentation

  • Document the base types of the language: natural & real numbers, intervals, tuples, and functions.
  • Update the examples with the new grammar.

Fix released binaries

Linux release fails on missing glibc.
Also target/release should be removed from the tarball.

Create a C-based WASM example

In order for the runtime to be multi-language I need to define an interface that works well with any WASM language. For now, the best option is C. Though, Rust and Assembly script would be good candidates.

Expand grammar to support tuples

Creating a tuple looks like this:

tuple  = "{" members? "}" ;
values = expression ( "," expression )* ;

To match a tuple as a pattern:

explode = "{" members? "}" ;
members = pattern ( "," pattern )* ;

We also need to expand assignment to be:

assignment = pattern "=" block ;

To do that we need to change how we match identifiers in a pattern.
This is still an open question.

Benchmark

Add a Fibonacci number benchmark similar to the one in Lox.

@fibonacci(@n <= 1) = n
@fibonacci(@n) = fibonacci(n - 2) + fibonacci(n - 1)

@loop(@i = 0) = fibonacci(I)
@loop(@i) = [
  fibonacci(i)
  loop(i)
]

Make wasm syntax easier to work with

Currently the syntax is verbose and performs no validation on the cross-references between types.

One way is by adding constructor methods to each module sub type on module that's returns an index type. The index type implements into usize when paired on a tuple with a module.

Expand grammar to support byte strings

Bytes strings can be their own grammar or an extension on tuples. For now, I think making them built on top of tuples with some bit-specific patterns would work best.

Grammar:

Bytes         = { "\"" ~ Byte* ~ "\"" }
Byte          = _{ "0" | DecimalByte | BinaryByte | CodePointByte | HexByte }
HexByte       = @{ "16#"? ~ ASCII_HEX_DIGIT{2} }
DecimalByte   = @{ "10#" ~ ASCII_NONZERO_DIGIT ~ ASCII_DIGIT{0, 2} }
BinaryByte    = @{ "2#" ~ ("0" | "1" ){8} }
CodePointByte = @{ "'" ~ CodePoint ~ "'" }
CodePoint     = { "\\n" | "\\r" | "\\t" | "\\0" | ANY }

Example:

bytes = "FF '"' 10#32 '\' 10#256 2#01001000 ' ' AA '\n' ''' '\0'"
;"FF x:4 y:4 | rest"

Remove `Value::Unit`

Switch overloaded operators to return a Result<Value, RuntimeError> instead of Value::Unit. This allows better error messaging in the REPL.

Will require a new error about being unable to perform an operation on 2 values of some types.

Update docker tags

Don't update latest on PRs only on push to main.

Use a build of Tortuga with version command to get the proper tag.

Could make existing workflow only run on push to main and add docker steps to build.yml

Add support for Lists

Lists are a synonymous with recursion in many functional languages.
Lists are typically defined by the following grammar:

Empty  = ɛ ;
List   = Empty | Member List ;
Member = ANY ;

In Tortuga, tuples allow us to define lists as:

Empty  = "{" "}" ;
List   = Empty | "{" Member "," List "}" ;
Member = Tuple | Number | Function | Interval ;

That means lists are nothing more than syntactic sugar on nested tuples limited to a cardinality of 2 (first and rest).

The goal of this issue is to define syntactic sugar for defining and deconstructing list-like tuples. The benefit is that the new grammar would enable us to optimize list-like tuples in the future to minimize space usage.

Empty  = "{" "}" ;
List   = Empty | "{" Member | List "}" ;
Member = Tuple | Number | Function | Interval ;

Reduce the use of generics in functions to speed compilation

Currently, the compiler depends on AsyncRead, Unpin, and Debug in order to pass the code to be compiled to the lexer.
For emitting the compiler depends on AsyncWrite, Unpin, ?Sized, and Debug to pass around the output.

Currently I am using the async types from futures, but I would like to simplify this to using a library type or trait that encompasses the same. My concern is that as it stands switching between async libraries for reading and writing as well as sync vs. async io is a huge pain.

Best to start with the read path as that is the least used.

Module Exports

Add exports to a program.

The exports is the return value of a program.
Programs that have a list of expressions (not continuations for the REPL) must end with a tuple that exports functions.

Example:

a(n) = n^2
a(x, y) = x*y

b(n) = 0 - n
b(x, y) = y - x

{|a| = 2, |b| = 2}

Define a Finite State Machine for the grammar

Implement the parser as a Finite State Machine that transitions on tokens and computes some context based on the transition (from state, to state, input).

This will allow us to define an LR(1) parser that is hand-written instead of relying on parser generators.

Proposal

The idea I have is to define a non-deterministic Finite State Machine by defining the sequence of tokens that comprise a grammar rule. The parser builder, then takes each of these rules together with their sequences and builds a NFSM that stores possible current states and the current sequence of tokens.

The parser drives the NFSM by passing it tokens until the EOF. The NFSM returns a Result<Option<Rule>, Error>. Attempting to transition after after EOF is an error.
The NFSM returns Ok(Some(Rule)) when a rule completes and Ok(None)) when in a non-terminal state. To implement panic mode, define a synchronization rule in the NFSM and on an error the parser gets a synchronization iterator. The iterator will discard one of its tokens until the sync rule matches and the NFSM is in a non-terminal state (consumes as many of its token buffer as possible).

Logging

Add logging to the crate.

Functions should lose their binding

Currently the binding makes it so a function cannot reference any functions defined after it's declaration. That's why I needed to override the function to support factorial. Only the old factorial definition existed.

A better implementation is to do multiple passes to determine the names at leach scope level.

Expand the grammar

The next step is to expand the grammar to include:

  • variable constraints
  • functions
  • statements
  • chained comparison (e.g. 1 < 2 < 7)

Also, will need to expand the interpreter to have state so that constrained variables can be referenced in new entries (allowing for multiline programs to be split into multiple entries). The interpreter should not allow overrides to match compiled behavior.

Syntax highlight

Implement syntax highlighting for code samples shown in the book.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.