jarrettbillingsley / croc Goto Github PK

Croc is an extensible extension language in the vein of Lua, which also wishes it were a standalone language. Also it's fun.

Home Page: http://www.croc-lang.org

Vim Script 0.13% C++ 97.20% C 2.33% CMake 0.33%

croc's Introduction

The Croc 🐊 Programming Language

What's Croc?

Croc is a small, dynamically-typed language most closely related to Lua, with C-style syntax. Its semantics are borrowed mainly from Lua, D, Squirrel, and Io, though many other languages served as inspirations.

It's not developed anymore. No one ever used it. But I still love it 😔

croc's People

Contributors

Stargazers

Watchers

Forkers

oloed ligustah eigenraven stethd mcmkmp glycerine massmessage uplinkcoder whismeow redchew-fork jjzhang166 meowboy326

croc's Issues

Add 'custom verbatim' to doc comment markup

You can have custom sections and custom spans, but there's no way to do anything like custom text structuring. How about a 'custom verbatim' section, where you can add a "type" to the \verbatim command (like with \code) so you can embed stuff inside doc comments. Liiiiike:

\verbatim[dot]
   And here you'd put a dot diagram description!
\endverbatim

Resolve the syntactic inconsistency between for(each) loops and for comprehensions (as well as if)

The comprehension syntax was (blindly) borrowed from Python, and uses 'in' to separate the indices from the container, as well as not distinguishing between 'for' and 'foreach':

[x for x in 1 .. 10]
[x for x in someContainer()]
[x for x in y if someCondition()]

The control structures, however, use a different syntax:

for(x: 1 .. 10) // or for(x; 1 .. 10)
foreach(x; someContainer())
if(someCondition())

Either the comprehensions should use the same syntax as the loops, the loops should use the same syntax as the comprehensions, or some other compromise.

Method calls with non-const method names AND custom context are not codegen'd correctly (if that's even possible?)

Something like:

a.(foo())(with b)

Fails, since foo() puts its result in what becomes the 'this' register and the method name returned from it is subsequently overwritten by b.

I can't think of any way to solve this other than changing the way method call instructions work (which would also change the native API).

I guess a good question would be: why the hell would you ever want to use a custom context on a method call? Why would you want to look up the method through one object, only to replace the context with another object? It just seems like it's asking for trouble anyway. In the rare cases that you need to call some function that happens to be in a namespace with a custom context, like foo.bar(with x), you could just surround the operand with parens, giving (foo.bar)(with x).

Bignums?

Support for arbitrary-precision integers? They tend to be useful. Should they be a basic type, or should they be implemented as objects? hmm..

Custom parameter type constraints

Turn stuff like

function foo(x: @blah)
{
}

Into

function foo(x)
{
    assert(blah(x), "bad parameter x");
}

Make safe stdlibs always loaded

The safe stdlibs are indispensable and many of the modules add methods to built-in types, without which they would be nigh-useless. I can't think of a reason why not loading the charlib, or the mathlib, or what have you, would be beneficial. If someone wants to replace the standard library they can muck around with the Croc source, since they'd have to deal with some of the "special" modules anyway (like exceptions and modules).

The unsafe libs should of course be optional.

Add support for tango.core.Variant

I'd be cool if i could just push/get a Variant, because i'm having a bit of code that uses Variants but should be independent from MiniD. Like:

Variant v = Variant(5);
pushVariant(t, v);

auto w = getVariant(t, 0);

memblock.makeString

It'd be quite handy if I could create a string from a memblock (since sometimes memblocks contain strings).
When I run this query "SHOW COLUMNS FROM table" two fields will be marked binary, but in fact contain strings.
Right now I have to do something like this to get a string, when I have a memblock in binary.

binary = binary.toArray()
binary.apply(toChar)
local str = string.joinArray(binary)

This does look inefficient to me, so I thought it might be handy to have this in the base lib.
There are probably better ways to name it, but toString is already taken :P

minid.ex.runFile incorrectly sets the name of the module to the function name

Which, because of the way the compiler works, is always "", meaning the file's namespace then becomes _G.(""). I remember making the top-level function named this to avoid weird error names elsewhere (like foo.bar.foo.bar(31)), but this has only introduced another problem. The root problem is that the compiler does not give back the module name. It should.

Instance fields and extra values may already be collected when finalizers are run

This is a problem.

Add syntax for catching the new typed exceptions

Something like

try
   ...
catch(e: E1)
   ...
catch(e: E2|E3)
   ...

It'd be transformed by the compiler into:

try
   ...
catch(e)
{
   switch(e.super)
   {
      case E1: { first catch } break
      case E2, E3: { second catch } break
   }
}

Make Vector a core type?

Py3K has bytes as a core type. Vector is proving invaluable. Should it be made one as well?

Module system improvements

There are a few things I want to do:

Make it so module namespaces aren't inserted into the global namespace hierarchy until after they've been successfully loaded. No more half-loaded modules sitting in the globals.
Catch exceptions thrown by the module init function and turn them into ImportExceptions (dunno why I didn't do that during the great Exceptioning)
When a module is reloaded, re-insert sub-module namespaces as members of the namespace.

push/popHeap considered dangerous

The underlying Tango functions used to implement these functions resize the array's buffer, causing the MiniD array to point onto the D heap, which is all kinds of bad.

Come up with some kind of standardized syntax for doc comments

Trying to port the stdlib docs over to built-in docs and realizing what folly it is to embed raw TracWiki markup in the docs. It looks terrible when dumped to the console in croci and so forth. I'm also feeling constrained by the lack of param docs, cross-references, etc.

Emulating DDoc seems like a substandard idea. DDoc is well-suited to statically-compiled D, where all the text macros that will be used to process the file are determined statically; trying to port that behavior to Croc sounds like a nightmare. DDoc is flexible but perhaps a little too flexible. Not to mention the $(MACRO syntax) leaves something to be desired. Maybe something more along the lines of Doxygen or Javadoc?

Make order of class wrapping irrelevant?

Currently when you wrap multiple classes with chained .type calls to WrapModule?, the D compiler will evaluate the wrapped class functions in an odd order, making wrapping base classes and their children odd (not to mention compiler-dependent). Could it be possible to make wrapping classes order-independent, so that they'd maybe be instantiated when the module was loaded the first time?

{{{
Sorry I haven't worked on this or responded to it in so long.

I haven't come up with a good way to solve this. It seems like it should be easy, but then you consider things like: what happens if the base class and the derived class are in different modules? How do you ensure that the modules are loaded in the correct order? What if there is a circular dependency of derivation between two modules (i.e. module a has "Base1" and "Derived2 : Base2" and module b has "Base2" and "Derived1 : Base1")? Blahblahblah.

Really, I don't know if there is a simple solution. It's an arbitrary dependency graph which would have to be built up in a pre-wrapping phase and checked for cycles.. ick.

I don't mean to sound like I'm copping out on this, but it just seems easier to require that the base class be wrapped before the derived. Maybe WrapType? could take some kind of flag (or have another version of WrapType?) to indicate that you'll be wrapping the base class too, in order to at least put some kind of runtime check to ensure that the base class has been wrapped.
}}}

Moar GC interface

Now that the GC actually has some tweakable parameters and multiple kinds of cycles, we need some way of interacting with it!

Keep stdlib char/string/culture handling simple

I've been kind of taking some of the more advanced text handling features of Tango for granted, but looking at what I'd have to do to keep the same level of functionality in the stdlib (for things like date/time formatting) when porting to C++, it would introduce a hell of a lot of external dependencies. ICU would likely be the best choice, or Boost.Locale as a wrapper for it; but still, a lot of this functionality is overkill for most apps. Complex I18N should be handled by a second- or third-party library; the stdlib should only handle the most basic string processing, and maybe offer a strftime-esque way of formatting dates.

The only thing that I worry about is stuff like .toUpper and .toLower. This is really common functionality, but as it turns out, you really need a lot of support code to do this correctly for Unicode. Same goes for case-insensitive ops in the string lib. Should this stuff just work on ASCII then?

Better binding lib docs and examples

Needs to happen.

Finish the builtin stdlib docs

Do thaaaaaat.

Split up serialization into two layers

There should be a lower layer which is only concerned with converting objects to and from raw blobs of bytes (memblocks or streams?), and an upper layer which adds the actual serialization abstraction. The lower layer could have stuff written in native code for what couldn't be done in Croc, and the upper layer could be entirely done in Croc (which makes it easier to use Croc streams).

Several methods of string.StringBuffer can behave oddly/corrupt data if 'this is other'

Some methods (opEquals, opCmp, opCatAssign) detect if 'this is other' and act appropriately, but others (opCat, opCat_r, fill, insert) do not. Fix this.

Separate memblock and vector functionality

The more I think about it, the more sense it makes to keep memblocks as a basic blob of bytes. Having to check that memblocks are of the appropriate type is awkward and annoying. Really, the vector functionality (treating them as mathematical vectors of numbers) should be moved into a separate object using a memblock to back it, much like StringBuffer uses a memblock as its backing data store.

Better GC

The current GC is simple and (as far as I know!) correct. But it could stand to be better. It's just a simple single-pass mark-and-sweep algorithm right now. Need to research incremental and generational collection, as well as possible GC optimizations depending on data types. Also make sure that invariants like deterministic removal of null weakrefs from tables still hold.

Binding lib doesn't use throwStdException

https://gist.github.com/6705f963139ca2d2c849

EDIT:
You will probably want to use the proper Exception types, now that I think about it.

Remove the attrs functions from the baselib?

The attrs decorator is actually a weird holdover from a much earlier point in Croc/MiniD's development. I originally ported the "attribute tables" from Squirrel, which was a way to attach arbitrary data to program objects. It had a dedicated syntax and the references to the attribute tables were actually stored in the objects they were attached to. However once decorators and weak references were added, I dropped the special syntax and reimplemented attributes as a decorator function.

But with the addition of generic decorators and doc comments, most of the utility of the attribute tables has been lost. Why would you bother writing something like @attrs({serializable = true}) and then checking if attributesOf(Class).serializable is true, when you could simply write a little serializable decorator that registered the class with a serialization library? Besides, the attribute table functionality is so trivial to implement that it seems almost pointless to include it in the standard library. It's just a light wrapper around a WeakKeyTable.

So if there's no real reason to keep them around, I may just drop them.

Provide means for accessing 'this' param of enclosing functions

This is something people tend to run into a lot in JavaScript: they want to make a closure that accesses the enclosing function's 'this'. However, because 'this' is implicitly defined, there's no way to directly access it by name; you have to instead save the outer 'this' in a local and use that local in the closure instead. You have to do the same in Croc:

function outer()
{
    local self = this
    return function() self.doSomething()
}

It would be nice if, instead, there were a way to directly access the 'this' of an enclosing function (or even, arbitrarily many enclosing functions). 'this' is just a normal parameter anyway, aside from not being writeable; there's no technical reason it can't be done.

Return type constraints?

Would return type constraints be possible/useful? Wouldn't really provide the optimization opportunities that param type constraints do, but would at least be a useful debugging aid..

Net [addon] library

I was considering this for an addon library, but considering how important networking and the internet have become for run-of-the-mill applications, I'm thinking it might be important enough to include it in the core libraries. There needs to be some kind of socket abstraction, probably as low-level as Python has it, to make apps that can interface with a wide level of net APIs, as well as some higher-level interfaces for common tasks.

Some pastes:

http://paste.dprogramming.com/dpsbu973 http://paste.dprogramming.com/dpowsing

Provide means to dump objects similar to tables (json)

Having to transform any object to table before I can use dumpVal or toJSON on it seems like too much work.
Shouldn't there be enough information to do that automatically?

Some way to auto-convert non-float params to floats on entry to function

It's really annoying to write something like:

function foo(x: int|float)
{
    x = toFloat(x)

    // do stuff with x
}

Just to allow foo to be called with ints or floats.

Maybe there should be a "number" parameter type constraint which would insert an automatic conversion to float if the parameter is an int. So the above becomes

function foo(x: number)
{
    // do stuff with x
}

Namespace freezing

Look into this. I remember it coming up at some point. It'd be good for sandboxes and performance.

Remove redundancy between croc.ex.CrocDoc and croc.compiler_docgen

They have a lot of shared mechanisms, and they should be refactored into a separate module.

Split the IO library into two

The filesystem access part and the path manipulation part should be two separate libraries. Path manipulation is obviously safe. Also 'io' is kind of a terrible name; maybe call the two parts 'path' and 'file'.

Remove the 'coroutine' expression

I originally implemented the 'coroutine' expression because it felt silly to create a built-in type with a function. Threads all felt very second-class in Lua. However, there's a good reason thread creation should be a function: sandboxing.

Also I want to purge any use of the word 'coroutine' from the language and docs because coroutines are an abstraction (a thin one, but an abstraction nonetheless) built atop threads, and thread is the name of the type (because that's what it is!).

nativeobjs that refer to scope objects can crash the VM upon their collection if the D object has gone out of scope

So something like

{
    scope c = new Class()
    pushNativeobj(t, c);
}

Can crash when the nativeobj is collected, as it tries to access the D object's classinfo, but that instance sits on the stack which has, by that point, been thoroughly smashed.

I don't know how to solve this. It might be possible to throw an exception in pushNativeobj if it detects that the reference points into the stack, but how do you determine what the stack is? Is there runtime stuff that can tell me? Is it portable?

Some sugar for lambdas

Another use would be "lazy parameters" (as known in D). Currently i have to write:

global function lazyFunc(param : function)
{
    writeln $ "I'm a lazy ", param()
}

lazyFunc(\-> "param")

That doesn't look very nice either, in my opinion. I think it could be made much shorter and better looking. Some suggestions:

lazyFunc(\\"param")
lazyFunc(\"param")
lazyFunc(:"param")

local lazy param = "test"
local param = lazy "test"
//translates to local function param() = "test"
lazyFunc(param)

lazyFunc(lazy "test")

Just brainstorming here, i hope you don't mind :P

Revisit table-weakref interaction

Tables are defined to interact with weakrefs in a certain way: if a table key-value pair has a weakref as either the key or the value, and that weakref goes null, then that key-value pair is removed from the table. This behavior was easy to implement with the old tracing collector which looked at every key-value pair in every table in existence every cycle. But the new collector won't even look at tables unless they're modified by the mutator since the last collection. This makes it more difficult to keep track of which tables need to be normalized (have their null weak entries removed). This has made me reconsider the way tables and weakrefs interact.

There are a number of suboptimal problems with it:

It incurs a performance penalty on EVERY table. In order to keep track of which tables need to be tracked for weak entries, every assignment to a table slot has to be checked for weakref objects. This is an unacceptable penalty since most tables don't need this behavior.
In virtually every case, you won't mix weak key-value pairs with non-weak ones. You want ALL the keys or ALL the values or ALL of both to be weak. Forgetting to weakref an item on insertion leads to annoying bugs.
Related to the above point, it's annoying to have to put weakref() around every key/value/both, or having to deref() when accessing. More tedium, boilerplate, and bugs.

I considered getting rid of weakref objects entirely and switching to weak tables like Lua has, but there are actual legitimate uses for lone weakref objects, especially since the new GC performs worse with cyclic data, and using weakrefs to reduce cycles (such as using them for child-to-parent pointers) can improve performance. Requiring the use of a table object to keep a single weak reference would be silly.

So I think there should be weakrefs AND weak tables. Weak tables could be created with functions like hash.weakKeyTable() and similar; table objects would have bits indicating whether they had weak keys, values, or both, and could be kept in an internal (weak, haha) hash to make sure they could be normalized after GC. They could internally use weakref objects to simplify implementation, and automatically wrap/unwrap them upon access. Or something else.

Avoid "overloading" in stdlib functions

Fake overloading on number and types of parameters (except in the case of one parameter which is treated nearly identically regardless of the type) is not a good fit for Croc and tends to make interfaces tricky to use. It's better to have multiple functions that have different signatures.

Should the GC library be unsafe?

The GC library isn't really a "core" library in the sense that it's not "blessed" in any way like the other core libs. It would, however, be easy for malicious scripts to crash the host by disabling collections, which would cause a memory allocation failure eventually (and most hosts probably wouldn't be programmed to deal with that situation).

There is, however, one feature in the GC library that the stdlib depends upon: the hash library uses post-GC callbacks to normalize weak tables. I suppose this could be solved by making weakref objects work more like in Java (where they can be assigned to a queue to which they're pushed when they go null).. or by exposing the post-GC callback interface to native code, and only exposing it to script code when the GC lib is loaded?

Your self hosted wiki has a redirect loop on the home page

You might want to fix that fast since it prevents reaching the wiki for some people.

Metamethods for builtin types should all be in the baselib

Currently the opApply metamethods for arrays, tables, and namespaces are defined in their respective libraries, but that means if a program doesn't load these libraries, you can't use foreach on those types, which seems silly. I think it should be a policy that all metamethods of builtin types should be part of the baselib, and "extra" methods would only be part of the non-essential libraries.

CrocArray.slice vs. CrocArray.data in superGet

I know you don't like the binding library so please don't hit me!

superGet (around line 768) has this:

    }

    auto data = getArray(t, idx).slice;
    auto ret = new T(data.length);

    foreach(i, ref elem; data)

However you seem to have changed slice to data in types.d

Tiny bug with column location in doc comment errors when lines start with asterisks

When you write this:

/**
 * \table
*/
function f() {}

The error message about \table have no matching \endtable is on the right line but the wrong column because the lexer strips the whitespace and asterisk off the beginning of the line. To fix this, that asterisk-stripping behavior has to be moved into the doc comment lexer.

Bud/Build @lib doesn't work as expected

C:\Users\Andre\Downloads\trunk-r610\trunk>bud @lib -test
Command: 'C:\Users\Andre\dmd\bin\dmd.exe -c @minid.rsp'

C:\Users\Andre\Downloads\trunk-r610\trunk>

This shows that bud will do the compiling, but no lib will be linked.

The option -allobj for bud seems to fix this problem.

C:\Users\Andre\Downloads\trunk-r610\trunk>bud @lib -allobj -test
Command: 'C:\Users\Andre\dmd\bin\dmd.exe -c @minid.rsp'
Command: 'C:\Users\Andre\dmd\bin\lib.exe @minid.lsp'

C:\Users\Andre\Downloads\trunk-r610\trunk>

I'm actually using http://dsource.org/projects/cdc/ to build my project and statically linking to minid works perfectly fine, even the binding library :O

dmd -run cdc -I..\minid -L+..\minid\ -L+DD-minid.lib -ofminey miney bindings

Implement the new stream library

I've had that prototype streamlib sitting around for like half a year now. IMPLEMENT IT.

Change foreach iteration protocol?

Currently there are two protocols: one for "iterator functions" and one for threads. Threads are fine, I think; it's the iterator function protocol that's.. hmmm.

It's borrowed from Lua, and like many things in Lua it's designed to be lightweight and not impose a lot of structure. It does a good job of letting you iterate over simple things without much overhead (by not having to allocate closures to iterate over simple list-like things), but the problem with it is that it's.. not very intuitive. Pretty much everyone who's used Croc on any large scale has asked me just how the hell to write an iterator. Even I have to really think about it when I write one.

Iterator functions also have one major shortcoming: when the first index is null, iteration ends. This works fine for tables, which is what Lua designed it for, but it limits general-purpose use of iterator functions. What if you want to iterate over the rows of a database query, where the first column is nullable? What if you have a map container that DOES allow null as a key? What if you zip over two arrays, the first of which contains nulls? In all of these cases, you have to either put a dummy index at the beginning (which makes the interface less intuitive and increases cognitive load), or you have to use thread iterators (which are overkill for a lot of use cases).

Thread iterators are really simple to write and use, and use out-of-band signaling to end the loop (when the thread dies), but they're pretty heavyweight. They're great for when you want to do a lot of processing and abstract it all behind a foreach loop, but you wouldn't want to invoke one to iterate over an array or table. So there's still a definite use case for lightweight iteration, it's just.. how do you redesign it to be more straightforward?

An important question is: how bad is it, really, to have to allocate something on the heap to perform an iteration? I don't want to discount a solution JUST because it might force you to allocate something at the beginning of each foreach loop. Of the built-in and stdlib types, really the only ones that DON'T need to allocate any state are the ones that are suuuuuuper simple list-like objects: arrays, memblocks, StringBuffers, Vectors, Regexps. Just about everything else (including tables and even strings, due to the difference between codepoint and byte offsets) requires at least a little more state than what the iterator function protocol gives you -- which is just "the object being iterated" and "the index of the last iteration". So it seems that allocating state to do iteration will be the rule rather than the exception, and maybe that isn't such a big problem.

So let's see what other languages have done to solve this.

Iterator objects

Like in Java. It's a small object, usually only with a couple methods like "hasNext" and "next", and maybe a "remove". While this interface is simple, it's a little more heavyweight than I'd like. You have to make a class, separate from the object it belongs to (or perhaps use a table?), and this class has a LOT of boilerplate. You might even have to create two different classes, one for "removable" iterator and one read-only one, because forcing EVERY iteration to allow removing is inefficient. One iterator object consists of the class instance AND a namespace to hold its fields, and a namespace has a blob of memory; that's three memory allocations for one iteration. Ehhhhh, don't think so.

Inversion of control

Like D. The foreach loop's body becomes a closure that's passed to the object's opApply method. It's a clever solution and makes iterators easy to write, but has a major problem unique to Croc: seamlessly returning multiple values from inside the loop becomes very tricky. You have to somehow save an arbitrary number of results, return THROUGH the opApply metamethod, and then return the results that you saved. I can't think of any way to do this short of introducing extra opcodes/complexity in the interpreter and/or allocating all the return values on the heap, somehow. Also, debugging/stepping through inside-out foreach loops is, in my experience, rather annoying B| Not to mention the extra entries in exception tracebacks due to it.

Generator functions

Like in Squirrel. These are a form of restricted thread, and like threads, they allow you to yield some values while saving the state of the generator, and then the generator can be resumed to get the next values. Unlike a thread, a generator can only yield from the top-level function; this means that the stack space that it needs to store between yields is constant and can be allocated in a single blob along with a generator function object. This also means that a new type would have to be introduced. Generator functions give you a lot of the ease of using threads (or inversion of control), but without the overhead; in fact, if generators were available, many thread iterators could probably just be turned into generators instead. The downside of generator functions is increased complexity in the interpreter. Now there's ANOTHER callable type to deal with; the interpreter has to keep track of whether or not it's inside a generator; things work differently inside generators.. blah blah blah. It's a good bit of work, but it IS an attractive solution.

So generators look nice. There's also a nice parallel between them and threads; all iteration is handled by threadlike objects. Neat.

Maybe! Possibly! Probably!

Put time.sleep in os?

Really, it's more of an OS function, and malicious code could sleep forever.

assert attempts to throw a string

Croc Command-Line interpreter
Use exit() or Ctrl+D to end.

assert(false)
TypeException at CLI(1): Attempting to throw a 'string'; must be an instance of
a class derived from Throwable

Maybe get rid of namespace literals/decls?

I don't think I've ever actually used one. Well, maybe once or twice, but nothing major. Seriously, most of the time you need a namespace it's automatically created, and the few times you need to make one of your own (like for making a sandbox), either you're going to be making it from the native API or you're going to be filling it up with stuff that already exists.

Getting rid of namespace literals/decls removes a bunch of crap in the compiler and interpreter that's probably barely ever used, and it can easily be replaced by a "newNamspace" function that just takes a table holding the members. Whatever!