jayconrod / gypsum Goto Github PK
View Code? Open in Web Editor NEWAn experimental new programming language
License: Other
An experimental new programming language
License: Other
Use pylint to locate and fix potential errors.
Use yapf to reformat code.
We should use PEP8, but we'll probably need some customization, since I don't want to make huge changes to the compiler or make it more difficult to edit. At minimum, we'll probably stick with camel case identifiers and 96 columns (or go up to 100).
make check-compiler
should ensure that pylint has been run and that yapf has no effect.
Many instructions can throw built-in exceptions. For example, instructions dealing with memory can throw NullPointerException
s if the receiver is null. Pointer maps are recorded for these instructions, since they may need to allocate exceptions, and that can trigger garbage collection.
Pointer maps encode the length of the stack. That means that when executing these instructions, we can't pop anything off the stack until we're sure we don't need to do garbage collection. Currently, the interpreter is not consistent about this.
It would also be good to pre-allocate built-in exceptions, so we don't need to record pointer maps for instructions that may throw them.
__all__
declarations.combinators
and ir_types
).Example:
let f = lambda (x: i64) x
let g = f(12)
class C(f: Function1[String, String])
def f(c: C, s: String) = c.f(s)
This is blocked by #45. We need to be able to use more than one definition from a single AST node. In particular, we need to reference the variable or field containing the closure (which may need to be captured or externalized), and we need to reference the closure method being called (which may need to be externalized).
Storing strings as sequences of 32-bit code points is inefficient in terms of memory use compared with UTF-8. Most common string operations (search, split, comparing for equality) work just as well if not better on UTF-8. Operations that do require code points (collation, normalization, other locale-specific stuff) generally works fine with iteration and does not need random access.
The built-in String
class should be represent strings as an array of UTF-8 bytes. Gypsum and CodeSwitch will both need to be updated.
The flag should be set automatically in declaration analysis.
Likewise, if we are calling an inherited non-final method on a receiver which has a final class type, the method should be treated as final. Nothing is going to override it.
std.Exception
should accept a string message in its constructor, and there should be a public method to retrieve this message. to-string
should also be implemented to include the exception class name and this message.
The CodeSwitch API should provide a convenient way to call to-string
for any Object
. Object::toString
can return a String
directly.
codeswitch::Exception
and codeswitch::Error
should provide more convenience methods on top of this. Something that returns std::string
would be nice.
Tests generated with mktest.py
should use these methods and should print something reasonable when an interpreted test fails.
Gypsum should have an expression which casts an object type to another object type lower on the type lattice. This is not safe by definition, so the cast will include a run-time type test, and an exception will be thrown if the type test fails.
It's already possible to accomplish this through pattern matching, but it's inconvenient:
def cast-old(obj: Object): String =
match (obj)
case s: String => s
case _ => throw CastException()
def cast-new(obj: Object) = obj as String
With dynamic type parameterization (#36), we can just have a method in Object
to do this, but this expression is much easier to implement in the short term.
If the same symbol is imported in multiple files in the same package, the Gypsum compiler currently reports an error "already declared". This is because the compiler considers all files to be in the same scope.
We do want definitions defined in the global scope of a file to be visible in other files without any explicit import or qualification. However, imports should only be visible within the file that contains them. So we can't simply define a scope for each file. Maybe we can define a scope for each file, then implicitly import top-level definitions into all other file scopes, then process explicit imports in each file scope.
Annoying spelling mistake.
Currently, we don't report an error when overloaded functions have the same type. We just report an error when one of these functions is called because the call is always ambiguous.
We need all definitions in a package to have unique names, and overloaded functions with the same parameter types will have the same name.
This error may be reported at the end of type declaration analysis, before we assert names are unique.
Need to add the following methods to List
, Dict
, Set
, Iter
, and Iterator
.
It may make sense to implement these in a common base trait.
Getting an error like the one below when building with make build-compiler
ValueError: script '/home/jay/Code/gypsum/compiler/compiler/compiler' not found
pyinstaller 3.0 is the default when installed with pip install PyInstaller
. As a workaround, use pip install PyInstaller==2.1
.
We might need some configuration process to tell what version of pyinstaller is present and modify the spec file accordingly.
When an error occurs, the compiler shouldn't completely bail out. It should proceed as far as it can and accumulate and print multiple errors. In general, this will happen within each phase. So if we get multiple errors in declaration analysis, we won't proceed to type analysis, but we can process all of the declarations.
Rather than being raised, errors should be reported to CompileInfo
. In cases where errors still need to be thrown, they should be caught and handled.
This may also open the door to warnings, which would be reported in the same way.
It should be possible to use an anonymous function as an expression.
let add = lambda (x: i32, y: i32) x + y
The types of lambda parameters may be required initially, but should eventually be inferred. Lambda functions will not be allowed to have type parameters. They will implement one of the FunctionN
traits if their return type and parameter types are all object types.
Every error message should explain what went wrong and suggest possible solutions. For example:
If there error is on a single line, the compiler should print out the line and point to where the error occurred for context.
In Gypsum, it should be possible for a value to have a type that is a union of two or more types. For example:
class Foo
class Bar
def f(b: boolean): Foo | Bar = if (b) Foo() else Bar()
The type Foo | Bar
is a union between Foo
and Bar
. This would be a subtype of both Foo
and Bar
, and for an instance of this type, it should be possible to call methods or access fields that are found in both types (those inherited from any common base class or trait).
This will be useful for exception handling (Java has something like this) and pattern matching in general. It is necessary for Type.lub
to be properly implemented for traits.
Intersection types may also be useful, but I'm not sure when they would be used.
It should be possible for callers to write the names of some of the arguments being passed to a function. This will be particularly useful for functions with a lot of parameters, especially if many of them have default expressions. Named arguments don't need to be passed in order. No unnamed arguments can be passed after the first unnamed argument.
def f(a: Object, b: String, c: i32 = 1, d: i32 = 2, e: i32 = 3)
f(Object(), d = 4, c = 3, b = "foo")
This is the old implementation of Some.try-match:
class Some[static +T](value: T) <: Option[T]
static def try-match(obj: Object): Option[T] =
match (obj)
case some: Some[_] => some
case _ => None
It's impossible for this function to tell whether obj
is Some[T]
, since we don't know at run-time what T
is. The compiler should report an error.
The problem is that _
is treated as the variable type T
. It should be treated as an existential type which is bounded by T
. If this were the case, the compiler would report an error on the some
expression since Some[_]
is not a subtype of Option[T]
.
For UseInfo
, we may sometimes use multiple definitions from a single AST node. For example, when calling a closure, we use the variable holding the closure (which may need to be captured), and we also use the function containing the implementation (which may need to be externalized).
For DefnInfo
, we may want to create multiple definitions at the same node. I can't think of a situation where this is strictly necessary right now, but it may be useful in the future.
Tables in CompileInfo
should basically be multimaps: a key may correspond to multiple values. It probably makes sense to track the order of the values, and e.g. getDefnInfo
would return the first value, and getDefnInfos
would return all of them.
The compiler should be able to statically infer types in many cases where they are tedious for the programmer to write. Currently, we infer types for variables and fields, since this is obvious and easy to do. I'd like to be able to infer types in the following situations:
We will not infer types for regular function definitions, other than return types of non-recursive functions, which we do already. Inference will only be performed within the scope of a function definition. This may include other functions through lambda expressions.
We will probably use something resembling Hindley-Milner type inference. During type definition analysis, for each unknown type, we'll create an inferred type object. Whenever we check if a type is a subtype of another, we'll add a constraint to the inferred type and proceed as if the check succeeded. If we need to perform a lookup for a field load or method call, we'll create a new inferred type object for the result of the expression and resolve the lookup later. At the end of type definition analysis, for each inferred type, we'll resolve lookups and pick the most general real type that satisfies all the constraints. If no such type exists, we'll report an error.
To be designed.
Basically, I want something like C++ parameter packs. I'm frustrated with sets of classes like Tuple2
, Tuple3
, ... and traits like Function0
, Function1
, ... . It should be possible to write a single class, trait, or function that has an indefinite number of type parameters and matching fields or function parameters.
public trait Function[+R, -P...]
public def call(p...: P): R
public class Tuple[+T...](value...: T)
public override def to-string =
"(" + join(value...) + ")"
Gypsum should allow programmers to annotate definitions.
Annotations should be defined with the annotation
keyword. Annotations may have parameters as part of their definition.
annotation Foo
annotation Bar(x: i32, y: String)
Annotations should be written before the attributes of a definition. The '#' symbol should be used to distinguish them from other identifiers.
#Foo
#Bar(12, "bar")
def do-stuff = ...
A reflection mechanism should be provided to list the annotations associated with a definition. It should also be possible to list the definitions annotated with a given annotation. These mechanisms should be exposed in the Gypsum language and in the CodeSwitch API.
Currently, Array
can be built just using a length. This means all of its elements are null
, breaking type safety.
We should provide a number of ways for the elements of new arrays to be specified:
The compiler should be fixed to ensure that an array with non-nullable elements can't be defined. We might also want some expression to be able to check for null quickly without doing a full type check. Maybe variable patterns could have a special case for this.
It should be possible to define type parameters with a template
attribute. This will be a lot like template type parameters in C++.
Instead of being expanded at compile-time though, templatized definitions will be expanded and specialized at run-time by CodeSwitch.
It should be possible to inspect the type of a template type parameter, since we can store the type in the expansion.
Template type parameters don't need to be bounded by Object
and Nothing
as regular type parameters currently are because the code generated by the compiler doesn't need to work with all possible types; CodeSwitch will fix it up when expanding the template at run-time.
Gypsum arrays have a 32-bit length. Max length is currently 2^31-1 since length is stored as a signed value.
We should cap the maximum block size on the CodeSwitch heap at 2^31-1 (or even smaller) for compatibility with 32-bit platforms. We may want to do some offset arithmetic with 32 bit integers. Having an array of maximum length would certainly exceed this. We should be sure to calculate total array size with 64-bit integers, then compare this with the limit.
If the allocation fails with a size limit, we should throw an OutOfMemoryException (which will need to be added). We should throw this when allocation fails in general.
Example:
class Foo
static let x = 42
This should define a global constant x
in the scope of Foo
.
I'm not sure if the compiler will check all transitive dependencies. This can easily be checked at analysis time though, before it causes problems.
Example:
class Foo
def m =
print("foo")
class Bar <: Foo
override def m =
super.m
print("bar")
This should print "foobar".
Use clang-format to reformat C++ code in CodeSwitch.
We'll probably be close to the Google style guide, with a few modifications, since I don't want to change CodeSwitch all that much. In particular, we'll stay with the 96 column limit (or go up to 100), and we'll continue to use camel case identifiers.
make check-vm
should ensure clang-format is run.
It should be possible to define a default expression for a function parameter. Default parameters will be optional for callers. If the caller does not pass an argument for a default parameter, the default expression will be evaluated to a value that will be passed instead. Non-default parameters may not be declared after default parameters.
def f(x, y = 2, z = x + y) = x + y + z
f(1) // 6
f(1, 0) // 2
f(1, 0, 4) // 5
Currently, when you load a function by name, it just returns the first function that matches, ignoring the possibility that multiple functions may match. The API should allow disambiguation by type. This will require exposing types in the API, but that will have other benefits.
It would also be nice if we could call a method by name, and use the arguments to figure out which overload was intended.
For example:
var a = 0, b = 0
The portion of this declaration 0, b = 0
is interpreted as a tuple, since the grammar is ambiguous. The parser should interpret this as multiple variables being declared. Parentheses should be required if this was actually intended to be a tuple.
After left and right are rolled up to ClassTypes, Type.lub calls findCommonBaseClass on their classes. This method is only supported on Class (at the moment), but this field is actually ObjectTypeDefn.
findCommonBaseClass should be renamed to findCommonBase, should be moved to ObjectTypeDefn, and should do a better job of finding a base trait or class.
Gypsum allows you to define the length, getter, and setter for an array like this:
class Foo
arrayelements i8, get, set, length
Currently, attributes are allowed before the arrayelements
keyword, and before each definition, but only the attributes before they keyword are used. Attributes before the definitions are ignored.
class Foo
public arrayelements i8, public get, public set, public length
We should remove the attributes from before the arrayelements
keyword, and only use the attributes before each definition.
It should be possible for functions to accept an indeterminate number of arguments of the same type. Once variadic type parameters are supported (#33), it should be possible for variadic parameters to have different types, corresponding with variadic type parameters. I'd like to avoid allocating a data structure to hold arguments (that can be done explicitly by the programmer if needed). Instead, we should provide special expressions to get an argument at a given index and to get the number of arguments. It should also be possible to pass variadic arguments to another function, expanding them into regular arguments.
def print-all(strs: String...) =
let n = strs.length
var i = 0
while (i < n)
print(strs(i))
CodeSwitch has very little support for binary compatibility in packages. If a developer changes some private implementation detail of their package without changing their public interface, the updated package should be usable without recompiling dependent packages.
There are other aspects to binary compatibility, but this bug will focus on the order and number of public fields and methods. Currently, these are always accessed by index. If private fields are added, or if public fields are re-ordered, that will break derived classes in other packages. Same goes for methods.
We should access fields and methods by name instead. Multiple methods may have the same name, so we should consider type signature to disambiguate. We already do this in the C++ API for both fields and methods, and it should be possible to use the same indexes to build Class
and Meta
objects at run-time.
When looking up definitions by name through the CodeSwitch API, Package and Class perform a linear search, which can be very slow. The first time a lookup is performed, they should build indices of their definitions. After that, definitions can be looked up quickly.
Gypsum has a lot of "magic" syntax -- things that seem simple to write (as long as you know how to write them), but are fairly ambiguous and are difficult to compile. It can be annoyingly difficult and unintuitive to fix syntax errors.
Lots of issues off the top of my head:
a.b()
should be interpreted as a method call or a closure call (depending on what b
is), but (a.b)()
is definitely a closure call.Many of the algorithms in Type (and probably elsewhere) are much less efficient or correct than they could be because there's no way to put Classes, Traits, or TypeParameters into a set or map.
CodeSwitch needs something like DefnId in the compiler. The format doesn't really matter, as long as those objects are unique and hashable. Probably better if they are not allocated on the heap.
In many places, especially when dealing with names and source names, we tolerate either str
or unicode
. For example, Scope.bind
and Scope.define
allow either one. This is confusing and maybe error-prone.
We should consistently use unicode
for names and source names. The lexer should produce these when it breaks up the source, and we should use it thereafter. The deserializer already reads strings as unicode
, so we can get some inconsistency depending on whether a name came from source code or a compiled package.
Most compiler errors refer to a specific definition in source code. The location of the definition should be included in the exception if it is known, otherwise NoLoc
should be used. The source name should be included if it is known, otherwise the full name should be used.
It doesn't make sense to have this logic in every place where we report an error. Instead, exception classes should have a static factory method to create an exception using a definition and a short message. The source name, name, and location can be taken directly from the definition.
In CodeSwitch packages, type parameters are considered top-level definitions. They may be referenced by classes, traits, functions, and existential types but are not defined as part of those. This is the wrong way of looking at things. It makes linking very difficult for example. It's not safe to link type parameters by name since their names may be auto-generated (especially for existentials) and are not stable when the source code changes.
Type parameters are actually contextual and should be encoded as such. They are always defined as part of something. When serializing, we should keep a stack of type parameters and refer to them by index. When flattening definitions, we should copy implicit type parameters from outer scopes, rather than referring to them directly.
There should be an attribute for type parameters (maybe dynamic
) in classes, traits, and functions that means their types should be visible at run-time.
Function example:
def is-instance[dynamic T](obj: Object) =
match (obj)
case _: T => true // type test on T
case _ => false
Class example:
class Box[dynamic T](value: T)
...
def get-string-from-box(obj: Object) =
match (obj)
case Box[String](s) => s
case => ""
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.