jayconrod / gypsum Goto Github PK

View Code? Open in Web Editor NEW

63.0 63.0 7.0 3.06 MB

An experimental new programming language

License: Other

Python 64.30% C++ 35.26% Assembly 0.39% Shell 0.05%

gypsum's People

Contributors

Stargazers

Watchers

Forkers

jacksontian shacharr zpzgone allenweb moneytech charygao icodein

gypsum's Issues

Reformat Python code in the compiler and commit to a style

Use pylint to locate and fix potential errors.

Use yapf to reformat code.

We should use PEP8, but we'll probably need some customization, since I don't want to make huge changes to the compiler or make it more difficult to edit. At minimum, we'll probably stick with camel case identifiers and 96 columns (or go up to 100).

make check-compiler should ensure that pylint has been run and that yapf has no effect.

Function pointer maps are wrong for built-in exceptions

Many instructions can throw built-in exceptions. For example, instructions dealing with memory can throw NullPointerExceptions if the receiver is null. Pointer maps are recorded for these instructions, since they may need to allocate exceptions, and that can trigger garbage collection.

Pointer maps encode the length of the stack. That means that when executing these instructions, we can't pop anything off the stack until we're sure we don't need to do garbage collection. Currently, the interpreter is not consistent about this.

It would also be good to pre-allocate built-in exceptions, so we don't need to record pointer maps for instructions that may throw them.

Clean up Python imports

Remove __all__ declarations.
Remove any wildcard imports
Remove qualified imports where they are cumbersome (especially combinators and ir_types).
Reformat multiple imported symbols on multiple lines.

Variable and property expressions that evaluate to closures should be callable

Example:

let f = lambda (x: i64) x
let g = f(12)

class C(f: Function1[String, String])
def f(c: C, s: String) = c.f(s)

This is blocked by #45. We need to be able to use more than one definition from a single AST node. In particular, we need to reference the variable or field containing the closure (which may need to be captured or externalized), and we need to reference the closure method being called (which may need to be externalized).

Represent strings as sequences of bytes instead of code points

Storing strings as sequences of 32-bit code points is inefficient in terms of memory use compared with UTF-8. Most common string operations (search, split, comparing for equality) work just as well if not better on UTF-8. Operations that do require code points (collation, normalization, other locale-specific stuff) generally works fine with iteration and does not need random access.

The built-in String class should be represent strings as an array of UTF-8 bytes. Gypsum and CodeSwitch will both need to be updated.

Methods defined in final classes should be final

The flag should be set automatically in declaration analysis.

Likewise, if we are calling an inherited non-final method on a receiver which has a final class type, the method should be treated as final. Nothing is going to override it.

Exceptions should have string messages, especially in tests

std.Exception should accept a string message in its constructor, and there should be a public method to retrieve this message. to-string should also be implemented to include the exception class name and this message.

The CodeSwitch API should provide a convenient way to call to-string for any Object. Object::toString can return a String directly.

codeswitch::Exception and codeswitch::Error should provide more convenience methods on top of this. Something that returns std::string would be nice.

Tests generated with mktest.py should use these methods and should print something reasonable when an interpreted test fails.

Downcasting expression

Gypsum should have an expression which casts an object type to another object type lower on the type lattice. This is not safe by definition, so the cast will include a run-time type test, and an exception will be thrown if the type test fails.

It's already possible to accomplish this through pattern matching, but it's inconvenient:

def cast-old(obj: Object): String =
  match (obj)
    case s: String => s
    case _ => throw CastException()

def cast-new(obj: Object) = obj as String

With dynamic type parameterization (#36), we can just have a method in Object to do this, but this expression is much easier to implement in the short term.

Imports are processed globally instead of per-file

If the same symbol is imported in multiple files in the same package, the Gypsum compiler currently reports an error "already declared". This is because the compiler considers all files to be in the same scope.

We do want definitions defined in the global scope of a file to be visible in other files without any explicit import or qualification. However, imports should only be visible within the file that contains them. So we can't simply define a scope for each file. Maybe we can define a scope for each file, then implicitly import top-level definitions into all other file scopes, then process explicit imports in each file scope.

"Overriden" is used everywhere instead of "overridden"

Annoying spelling mistake.

Compiler should report an error when overloaded functions have the same type

Currently, we don't report an error when overloaded functions have the same type. We just report an error when one of these functions is called because the call is always ambiguous.

We need all definitions in a package to have unique names, and overloaded functions with the same parameter types will have the same name.

This error may be reported at the end of type declaration analysis, before we assert names are unique.

Functional methods for List, Dict, Set, Iter, Iterator

Need to add the following methods to List, Dict, Set, Iter, and Iterator.

map
filter
collect
reduce
fold

It may make sense to implement these in a common base trait.

Compiler does not build with pyinstaller 3.0

Getting an error like the one below when building with make build-compiler

ValueError: script '/home/jay/Code/gypsum/compiler/compiler/compiler' not found

pyinstaller 3.0 is the default when installed with pip install PyInstaller. As a workaround, use pip install PyInstaller==2.1.

We might need some configuration process to tell what version of pyinstaller is present and modify the spec file accordingly.

Allow multiple errors and warnings

When an error occurs, the compiler shouldn't completely bail out. It should proceed as far as it can and accumulate and print multiple errors. In general, this will happen within each phase. So if we get multiple errors in declaration analysis, we won't proceed to type analysis, but we can process all of the declarations.

Rather than being raised, errors should be reported to CompileInfo. In cases where errors still need to be thrown, they should be caught and handled.

This may also open the door to warnings, which would be reported in the same way.

Lambda expressions

It should be possible to use an anonymous function as an expression.

let add = lambda (x: i32, y: i32) x + y

The types of lambda parameters may be required initially, but should eventually be inferred. Lambda functions will not be allowed to have type parameters. They will implement one of the FunctionN traits if their return type and parameter types are all object types.

Add detail to error messages

Every error message should explain what went wrong and suggest possible solutions. For example:

If a symbol is undefined, the compiler should use edit distance to find similar symbols in the same scope and suggest those in case there was a misspelling.
If no compatible definition can be found in a call, the compiler should list the candidates.

If there error is on a single line, the compiler should print out the line and point to where the error occurred for context.

Union types in Gypsum

In Gypsum, it should be possible for a value to have a type that is a union of two or more types. For example:

class Foo
class Bar

def f(b: boolean): Foo | Bar = if (b) Foo() else Bar()

The type Foo | Bar is a union between Foo and Bar. This would be a subtype of both Foo and Bar, and for an instance of this type, it should be possible to call methods or access fields that are found in both types (those inherited from any common base class or trait).

This will be useful for exception handling (Java has something like this) and pattern matching in general. It is necessary for Type.lub to be properly implemented for traits.

Intersection types may also be useful, but I'm not sure when they would be used.

Named arguments

It should be possible for callers to write the names of some of the arguments being passed to a function. This will be particularly useful for functions with a lot of parameters, especially if many of them have default expressions. Named arguments don't need to be passed in order. No unnamed arguments can be passed after the first unnamed argument.

def f(a: Object, b: String, c: i32 = 1, d: i32 = 2, e: i32 = 3)
f(Object(), d = 4, c = 3, b = "foo")

Blank types treated same as variable types

This is the old implementation of Some.try-match:

class Some[static +T](value: T) <: Option[T]
  static def try-match(obj: Object): Option[T] =
    match (obj)
      case some: Some[_] => some
      case _ => None

It's impossible for this function to tell whether obj is Some[T], since we don't know at run-time what T is. The compiler should report an error.

The problem is that _ is treated as the variable type T. It should be treated as an existential type which is bounded by T. If this were the case, the compiler would report an error on the some expression since Some[_] is not a subtype of Option[T].

CompileInfo should be able to store multiple records per AST id

For UseInfo, we may sometimes use multiple definitions from a single AST node. For example, when calling a closure, we use the variable holding the closure (which may need to be captured), and we also use the function containing the implementation (which may need to be externalized).

For DefnInfo, we may want to create multiple definitions at the same node. I can't think of a situation where this is strictly necessary right now, but it may be useful in the future.

Tables in CompileInfo should basically be multimaps: a key may correspond to multiple values. It probably makes sense to track the order of the values, and e.g. getDefnInfo would return the first value, and getDefnInfos would return all of them.

Type inference

The compiler should be able to statically infer types in many cases where they are tedious for the programmer to write. Currently, we infer types for variables and fields, since this is obvious and easy to do. I'd like to be able to infer types in the following situations:

Function call type arguments
Destructuring pattern type arguments
Lambda parameter types
Integer literal types

We will not infer types for regular function definitions, other than return types of non-recursive functions, which we do already. Inference will only be performed within the scope of a function definition. This may include other functions through lambda expressions.

We will probably use something resembling Hindley-Milner type inference. During type definition analysis, for each unknown type, we'll create an inferred type object. Whenever we check if a type is a subtype of another, we'll add a constraint to the inferred type and proceed as if the check succeeded. If we need to perform a lookup for a field load or method call, we'll create a new inferred type object for the result of the expression and resolve the lookup later. At the end of type definition analysis, for each inferred type, we'll resolve lookups and pick the most general real type that satisfies all the constraints. If no such type exists, we'll report an error.

Variadic type parameters

To be designed.

Basically, I want something like C++ parameter packs. I'm frustrated with sets of classes like Tuple2, Tuple3, ... and traits like Function0, Function1, ... . It should be possible to write a single class, trait, or function that has an indefinite number of type parameters and matching fields or function parameters.

public trait Function[+R, -P...]
  public def call(p...: P): R

public class Tuple[+T...](value...: T)
  public override def to-string =
    "(" + join(value...) + ")"

Annotations

Gypsum should allow programmers to annotate definitions.

Annotations should be defined with the annotation keyword. Annotations may have parameters as part of their definition.

annotation Foo
annotation Bar(x: i32, y: String)

Annotations should be written before the attributes of a definition. The '#' symbol should be used to distinguish them from other identifiers.

#Foo
#Bar(12, "bar")
def do-stuff = ...

A reflection mechanism should be provided to list the annotations associated with a definition. It should also be possible to list the definitions annotated with a given annotation. These mechanisms should be exposed in the Gypsum language and in the CodeSwitch API.

Need a better way to build arrays

Currently, Array can be built just using a length. This means all of its elements are null, breaking type safety.

We should provide a number of ways for the elements of new arrays to be specified:

Using a default value
Using a builder, which we can append elements to (might have nullable elements internally)
Using a function that returns elements when called.

The compiler should be fixed to ensure that an array with non-nullable elements can't be defined. We might also want some expression to be able to check for null quickly without doing a full type check. Maybe variable patterns could have a special case for this.

Template type parameterization

It should be possible to define type parameters with a template attribute. This will be a lot like template type parameters in C++.

Instead of being expanded at compile-time though, templatized definitions will be expanded and specialized at run-time by CodeSwitch.

It should be possible to inspect the type of a template type parameter, since we can store the type in the expansion.

Template type parameters don't need to be bounded by Object and Nothing as regular type parameters currently are because the code generated by the compiler doesn't need to work with all possible types; CodeSwitch will fix it up when expanding the template at run-time.

Size limit check needed when allocating, accessing arrays

Gypsum arrays have a 32-bit length. Max length is currently 2^31-1 since length is stored as a signed value.

We should cap the maximum block size on the CodeSwitch heap at 2^31-1 (or even smaller) for compatibility with 32-bit platforms. We may want to do some offset arithmetic with 32 bit integers. Having an array of maximum length would certainly exceed this. We should be sure to calculate total array size with 64-bit integers, then compare this with the limit.

If the allocation fails with a size limit, we should throw an OutOfMemoryException (which will need to be added). We should throw this when allocation fails in general.

`static` attribute for fields

Example:

class Foo
  static let x = 42

This should define a global constant x in the scope of Foo.

Bazel rules should check whether multiple versions of the same package are present

I'm not sure if the compiler will check all transitive dependencies. This can easily be checked at analysis time though, before it causes problems.

Allow calling overriden methods via "super"

Example:

class Foo
  def m =
    print("foo")

class Bar <: Foo
  override def m =
    super.m
    print("bar")

This should print "foobar".

Reformat C++ code in CodeSwitch and commit to a style

Use clang-format to reformat C++ code in CodeSwitch.

We'll probably be close to the Google style guide, with a few modifications, since I don't want to change CodeSwitch all that much. In particular, we'll stay with the 96 column limit (or go up to 100), and we'll continue to use camel case identifiers.

make check-vm should ensure clang-format is run.

Default parameters for functions

It should be possible to define a default expression for a function parameter. Default parameters will be optional for callers. If the caller does not pass an argument for a default parameter, the default expression will be evaluated to a value that will be passed instead. Non-default parameters may not be declared after default parameters.

def f(x, y = 2, z = x + y) = x + y + z
f(1)        // 6
f(1, 0)     // 2
f(1, 0, 4)  // 5

Replace Makefiles with Bazel

Loading overloaded functions, methods through CodeSwitch API

Currently, when you load a function by name, it just returns the first function that matches, ignoring the possibility that multiple functions may match. The API should allow disambiguation by type. This will require exposing types in the API, but that will have other benefits.

It would also be nice if we could call a method by name, and use the arguments to figure out which overload was intended.

Commas in variable declarations parsed as tuples

For example:

var a = 0, b = 0

The portion of this declaration 0, b = 0 is interpreted as a tuple, since the grammar is ambiguous. The parser should interpret this as multiple variables being declared. Parentheses should be required if this was actually intended to be a tuple.

In compiler, Type.lub has incomplete support for trait types

After left and right are rolled up to ClassTypes, Type.lub calls findCommonBaseClass on their classes. This method is only supported on Class (at the moment), but this field is actually ObjectTypeDefn.

findCommonBaseClass should be renamed to findCommonBase, should be moved to ObjectTypeDefn, and should do a better job of finding a base trait or class.

Attributes on arrayelements definitions are ignored

Gypsum allows you to define the length, getter, and setter for an array like this:

class Foo
  arrayelements i8, get, set, length

Currently, attributes are allowed before the arrayelements keyword, and before each definition, but only the attributes before they keyword are used. Attributes before the definitions are ignored.

class Foo
  public arrayelements i8, public get, public set, public length

We should remove the attributes from before the arrayelements keyword, and only use the attributes before each definition.

Variadic parameters

It should be possible for functions to accept an indeterminate number of arguments of the same type. Once variadic type parameters are supported (#33), it should be possible for variadic parameters to have different types, corresponding with variadic type parameters. I'd like to avoid allocating a data structure to hold arguments (that can be done explicitly by the programmer if needed). Instead, we should provide special expressions to get an argument at a given index and to get the number of arguments. It should also be possible to pass variadic arguments to another function, expanding them into regular arguments.

def print-all(strs: String...) =
  let n = strs.length
  var i = 0
  while (i < n)
    print(strs(i))

Binary compatibility for class members

CodeSwitch has very little support for binary compatibility in packages. If a developer changes some private implementation detail of their package without changing their public interface, the updated package should be usable without recompiling dependent packages.

There are other aspects to binary compatibility, but this bug will focus on the order and number of public fields and methods. Currently, these are always accessed by index. If private fields are added, or if public fields are re-ordered, that will break derived classes in other packages. Same goes for methods.

We should access fields and methods by name instead. Multiple methods may have the same name, so we should consider type signature to disambiguate. We already do this in the C++ API for both fields and methods, and it should be possible to use the same indexes to build Class and Meta objects at run-time.

Index definitions by name and short name for fast lookup

When looking up definitions by name through the CodeSwitch API, Package and Class perform a linear search, which can be very slow. The first time a lookup is performed, they should build indices of their definitions. After that, definitions can be looked up quickly.

Simplify Gypsum syntax

Gypsum has a lot of "magic" syntax -- things that seem simple to write (as long as you know how to write them), but are fairly ambiguous and are difficult to compile. It can be annoyingly difficult and unintuitive to fix syntax errors.

Lots of issues off the top of my head:

Layout analysis is too magical. I hate having braces and semicolons all over the place, but Python does a much better job interpreting indentation as blocks.
Gypsum classes are difficult to write because so much stuff needs to go on the first line. It's hard to break the lines in a way that satisfies layout analysis and makes sense.
Function and method calls without explicit argument lists are confusing. I'm trying to be too much like Scala here.
Function definitions should have an explicit parameter list, even if it's empty, to match calls.
Lambdas look weird if they don't have an explicit parameter list.
Parenthesized expressions may need to be treated differently by the compiler. This should be reflected in the AST. For example, a.b() should be interpreted as a method call or a closure call (depending on what b is), but (a.b)() is definitely a closure call.

CodeSwitch needs sortable, hashable identifiers for type objects

Many of the algorithms in Type (and probably elsewhere) are much less efficient or correct than they could be because there's no way to put Classes, Traits, or TypeParameters into a set or map.

CodeSwitch needs something like DefnId in the compiler. The format doesn't really matter, as long as those objects are unique and hashable. Probably better if they are not allocated on the heap.

Compiler should consistently use unicode instead of str

In many places, especially when dealing with names and source names, we tolerate either str or unicode. For example, Scope.bind and Scope.define allow either one. This is confusing and maybe error-prone.

We should consistently use unicode for names and source names. The lexer should produce these when it breaks up the source, and we should use it thereafter. The deserializer already reads strings as unicode, so we can get some inconsistency depending on whether a name came from source code or a compiled package.

Standardize error reporting for definitions

Most compiler errors refer to a specific definition in source code. The location of the definition should be included in the exception if it is known, otherwise NoLoc should be used. The source name should be included if it is known, otherwise the full name should be used.

It doesn't make sense to have this logic in every place where we report an error. Instead, exception classes should have a static factory method to create an exception using a definition and a short message. The source name, name, and location can be taken directly from the definition.

Type parameters should be contextual instead of global

In CodeSwitch packages, type parameters are considered top-level definitions. They may be referenced by classes, traits, functions, and existential types but are not defined as part of those. This is the wrong way of looking at things. It makes linking very difficult for example. It's not safe to link type parameters by name since their names may be auto-generated (especially for existentials) and are not stable when the source code changes.

Type parameters are actually contextual and should be encoded as such. They are always defined as part of something. When serializing, we should keep a stack of type parameters and refer to them by index. When flattening definitions, we should copy implicit type parameters from outer scopes, rather than referring to them directly.

Dynamic type parameterization

There should be an attribute for type parameters (maybe dynamic) in classes, traits, and functions that means their types should be visible at run-time.

Function example:

def is-instance[dynamic T](obj: Object) =
  match (obj)
    case _: T => true  // type test on T
    case _ => false

Class example:

class Box[dynamic T](value: T)
...
def get-string-from-box(obj: Object) =
  match (obj)
    case Box[String](s) => s
    case => ""

jayconrod / gypsum Goto Github PK

gypsum's People

Contributors

Stargazers

Watchers

Forkers

gypsum's Issues

Recommend Projects

Recommend Topics

Recommend Org