Giter VIP home page Giter VIP logo

rouge's People

Contributors

ashtonsnapp avatar

Watchers

 avatar  avatar

rouge's Issues

Memory Management

Most programming languages have some mechanisms for automatic memory management. A large majority of languages utilize RAII (Resource Allocation Is Initialization) to automatically handle memory allocation for the programmer, but there are multiple strategies for automatic de-allocation.

Rust, the language Rouge is being written in, has its ownership and borrowing system to handle automatic de-allocation as well as its safety assurances. It's an extremely flexible system that still gives you a level of control while making sure you don't shoot yourself in the foot, but dealing with its quirks can feel like slamming your head into a brick wall (and I'm saying this as someone who really likes Rust). Other languages use various forms of garbage collection, like the tracing garbage collection that most people are referring to when they say garbage collection. Which is understandable, as it is the most common strategy. But you can't count on a tracing garbage collector doing its job in a timely manner, and there are multiple algorithms and strategies for implementing it.

A much simpler solution is to perform reference counting. You probably have an idea as to how this is done - each object has a number associated with it that indicates how many references to it exist. When that number hits 0, the object is de-allocated immediately. This extremely simple method of garbage collection is used by Apple's Swift programming language, and I'm thinking we use it in Rouge.

Reference counting does come with its own problems, however. Firstly, you can get a reference cycle wherein two objects have references to each-other. This prevents either object from being automatically de-allocated. While CPython has a cycle-detecting algorithm, another solution is to use weak references. These are references that don't contribute to the reference count, and become invalid when the referenced object is de-allocated. This could be done either by somehow detecting when a weak reference is necessary at interpret or compile time, or by adding a keyword that lets developers decide when to use weak references. There's also the fact that it must be done atomically to be thread-safe, and the quite obvious fact that the reference count needs to be stored in memory and that modifying that count adds a performance hit.

There are also more specific problems dealing with memory and thread safety. With reference counting by itself, there's nothing stopping the code holding those references from all trying to modify the object at the same time. Considering this, the first solution that came to mind was implementing copy-on-write semantics. Essentially, when multiple references exist for a given object, any attempted modification through those references causes a copy of the object to be produced. The reference used is then switched to the copy (decrementing the original's reference count in the process), and the copy is modified instead of the original object.

This works, but introduces some programming challenges. Suppose you want to implement a multi-producer single-consumer channel, which will probably be done in the standard library but isn't for now. You need multiple transmitter objects, a single receiver object, and a buffer or queue to send messages along that the transmitters can push to and the receiver can pull from. However, given the mechanisms discussed so far, this likely wouldn't be possible.

The whole purpose behind writing this issue is to try and get opinions on what I've thought of so far, as well as suggestions for how to deal with memory management. Whether those suggestions are modifications to what I've proposed here, or full on alternatives, I'm open to hearing them.

Thank you for taking the time to read this text vomit, and have an awesome day.

Syntax and the Type System

As a recap from the conversation that occurred in #1, I'm considering making a move towards a more unified type syntax. Rather than having separate class and enum keywords, the type keyword - previously just used for type aliases - now also handles all type definitions as well. In addition, there will be a switch from an inheritance model to a mixin model. However, instead of having dedicated mixin definitions, any type can be inlined into any other using a new inline keyword. There will also be a definite move towards not having colons for separating signatures from whatever they go to, and making commas optional in multi-line blocks since newlines are (at least currently) significant.

Type Definitions

With that out of the way, I'm currently considering how to more generally handle the syntax of type definitions. I've come up with two options which I would like to gain opinions on, although if you have any further suggestions feel free to leave them.

Option A

Option A is the most similar to what is present now, and is more similar to Rust. I still need to update the publicly-readable README, but it would look something like this:

# Unit-like type.
type Empty

# Single-line definition of a tuple-like type, commas would be required for separating field definitions.
type Vec2D is (float, float)

# Multi-line definition of a tuple-like type, commas are optional but used for clarity.
type Vec3D is (
    float,
    float,
    float
)

# Single-line definition of a record-like type, commas would be required for separating field definitions.
type Person is string name, nat age

# Multi-line definition of a record-like type, commas are optional but used for clarity.
type Student is
    string name,
    nat year,
    [string: float] grades
end

# Single-line definition of a variant type, commas would be required for separating variant definitions.
type Optional<T> is None, Some(T)

# Multi-line definition of a variant type, commas are optional but used for clarity.
type Result<T, E> is
    Ok(T),
    Err(E)
end

And here's an example with type inlining:

type Transform3D is
    Vec3D translation,
    Quat rotation,
    Vec3D scale
end

type Entity is
    inline Transform3D,
    string name,
    float health,
    float max_health
end

Option B

Option B replaces the commas with keywords or operators: and/& for tuple-like and record-like types, or/| for variant types. The keywords/operators could possibly be made optional in multi-line type definitions, but it may make more sense to keep them for consistency or to allow for a move to total whitespace insignificance. The parentheses around tuple-like type definitions would also be dropped. This would be more similar to purely functional languages, like Haskell.

This could also make it possible to do types that mix different ways of defining types: you could mix tuple-like and record-type by only giving some fields explicit names, and you could make a type that's partially variant instead of creating a variant type and a wrapper type that has all the common fields in it (although I'm not sure on the potential syntax for that). I'm using the keywords here instead of the symbols as this is more consistent with languages often used for scripting (a use case Rouge also aims for), such as Python, Lua, and Ruby.

# Unit-like type
type Empty

# Single-line definition of a tuple-like type
type Vec2D is float and float

# Multi-line definition of a tuple-like type
type Vec3D is
    float
    and float
    and float
end

# Single-line definition of a record-like type
type Person is string name and nat age

# Multi-line definition of a record-like type
type Student is
    string name
    and nat year
    and [string: float] grades
end

# Single-line definition of a variant type
type Option<T> is None or Some T

# Multi-line definition of a variant type
type Result<T, E> is
    Ok T
    or Err E
end

And here's an example with type inlining:

type Transform3D is
    Vec3D translation
    and Quat rotation
    and Vec3D scale
end

type Entity is
    inline Transform3D
    and string name
    and float health
    and float max_health
end

Type aliasing

There is also the question of how to handle type aliasing, since the type keyword will still handle that. There are two options for this as well.

First, type aliases can be differentiated from type definitions by using = instead of is, like so:

type Result<T> = std:Result<T, Error>

Alternatively, we can use is and consider any definition of a tuple-like type with only one field to be a type alias, like so:

type Result<T> is std:Result<T, Error>

Regular expression bug found in lexer

Bug Overview

A new unit test added to the lexer module has caught a critical bug in one of the regular expressions (regexes) used by the lexer to send tokens to various callback functions.

This test tries to lex the examples/hello.rouge file, which contains a simple example of a Hello World program. On encounter of the "Hello world!" string literal, the lexer attempts to pass the literal as well as the trailing ) which closes out the parameters to outl!.

cargo test Output

running 1 test
test compiler::lexer::tests::lex_hello ... FAILED

failures:

---- compiler::lexer::tests::lex_hello stdout ----
thread 'compiler::lexer::tests::lex_hello' panicked at 'called `Result::unwrap()` on an `Err` value: [Error { is_warning: false, file: Some("examples/hello.rouge"), line: Some(1), span: Some(7..22), slice: Some("\"Hello world!\")"), kind: Interpret(Lex(InvalidToken)) }]', src/compiler/lexer.rs:836:83
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

failures:
    compiler::lexer::tests::lex_hello

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

Possible Causes

I am working off of the assumption that there is a problem with the regular expression itself. My attempts to debug the regular expression has only resulted in me filing a bug report on the GitHub page for RegExr as it wouldn't accept extended Unicode escapes with hexadecimal digits. However, it is known there are some issues with Logos' regex parser - such as \u{0}-\u{10FFFF} matching any byte instead of any Unicode character - so it may be an issue on their end. I would like to put in the effort to see if there's anything I can do first before throwing it over to them, though.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.