salmanahmad / silo Goto Github PK
View Code? Open in Web Editor NEWThe Silo Programming Language
The Silo Programming Language
The code fragment below gives a cryptic error (note the missing equals sign):
message : Integer checkcast(Integer, actor.read())
What happens if you have a native function that returns void. Will the compiler artificially insert a null for you?
Example:
public class foo extends Function {
@Function.Body
public static void invoke(ExecutionContext context) {
System.out.println("Hi!")
}
}
A function like this:
func(main(null => String) {
while(true { return("Hello") })
})
will cause a compilation error because it will think "Object" is being returned rather than "String" since the while loop will propagate an "Object" on the stack.
Create a Silo REPL shell that allows developers to execute commands. The shell would also need to include a sandbox capability because I would want to deploy this shell interface over the Web for a "Try in your Browser".
Variables
The first challenge is supporting local variables in a sane way. One way that I could handle this is to macro expand the code and then re-write assignment nodes (=
) so that they use a HashMapActor or something. This HashMap actor can be serialized and even stored in a cookie to reduce for the Web app.
Runtime
Each session should run in its own runtime so I can avoid name clashes.
Security
The first thing that I need to do is prevent against obvious security issues. I can do this by enabling sandboxing in the JVM with a security manager as described here:
http://stackoverflow.com/questions/502218/sandbox-against-malicious-code-in-a-java-application
In addition, I need to be wary of users that create threads. To prevent creating threads I need to look into ThreadGroups and setting the security manager to prevent new threads from being created.
This is a good start but I also need to be wary of run away code that could run in an infinite loop or attempt to allocate a huge amount of memory. To prevent this, I need to run each command on new thread with a timeout that is canceled after a certain amount of time. There are certain ways of doing this. One simple way is to use a Executors.newSingleThreadExecutor
and Future#get(DURATION, TimeUnits.SECONDS)
followed by a ExecutorService#shutdownNow()
. The problem with shutdownNow()
like here:
http://stackoverflow.com/questions/2275443/how-to-timeout-a-thread
However, we are not there yet. This only works if the user code pays attention to Thread.interrupt()
and a malicious thread may ignore that. To avoid that we could use "Thread.stop()", which is deprecated but is an option, or we could macro expand the code and then insert if(thread.isInterrupted()) throw("foo")
. Both of these work nicely, however, a malicious thread could still circumvent them by catching all exceptions (including the ThreadDead execution from Thread.stop()) and ignore it. Which leads us to our last security precaution a simple blacklist:
https://github.com/Raynes/tryclojure
https://github.com/Raynes/clojail/blob/master/src/clojail/testers.clj
Basically, we blacklist certain commands / forms that we do not like. The try-catch
statement is one of those.
Fall Back
As a fallback, I should kill the JVM process every 20 minutes to ensure proper quality of service. To do this, I should have a Silo program that is monitoring another Silo program running in another JVM. This will avoid any run away threads that may be leaked.
Additionally, the Silo program should be run on a dedicated linux box that is disconnected from everything else (so SSH keys, etc). The OS user account should also be a low-priviledge user. This is not as simple as it may seem since I would want the process to be listening on port 80. Perhaps I create a Silo reverse proxy application (or just use nginx / haproxy which is less interesting).
If a resumed Fiber throws an exception that function that catches the exception will not have the local variables initialized correctly and will also not clear the relevant execution frames to prevent replaying the execution over and over again.
The first step is easy - inside all catch blocks I should emit code that re-sets the local variables if the ExecutionContext
has a status of THROWING
(as opposed to RUNNING
or YIELDING
).
The second step is much harder - how do I invalidate all of the ExecutionFrames? I need a mechanism that allows me to determine which ExecutionFrame belongs to the current function, find it, and then remove all ExecutionFrames on-top of it. One idea is to embed an interned string with the fully qualified name of the function. I then call a utility method that iterates through the ExecutionFrames in reverse and returns the first frame that matches the fully qualified name. It then "nulls" out the rest.
Create a custom class for each call site with fields and static initialization method that matches the call site's JVM operand stack.
This has several benefits. First, it makes it very efficient. Removing all items from the stack since the static initialization method can "pop" all of the elements off the stack in one go. Additionally, it does not require any auto-boxing and minimal casting (when when restoring the stack). Second, it reduces the emitted code size. Instead of emitting a separate pop
for each operand it is all in one go and instead of swapping to create an array, it is a single static method call. Additionally, I eliminate the need for many of the casts.
While I do this, I should also add two other "special" variables: returnValue
and currentFrame
. These variables are used to avoid having to constantly load the ExecutionContext and ExecutionFrame (through a virtual method call) and to avoid having to swap
the return value when clearing dummy values during coroutine resumption.
The challenge with this, is that is causes me to generate a lot of small classes. I cannot dynamically create the StackFrames at runtime and re-use them across functions because of AOT compilation --- the class loader used to load a stdlib function will likely not be a RuntimeClassLoader. However, what I can do share custom StackFrames across function that used the same CompilationContext. This will likely reduce a lot of the small temporary classes while not having to worry about AOT compilation issues.
When creating these custom stack frames, I need to be wary about duplicated names and clashes. I am of two minds here. First, if there is a name conflict, it shouldn't matter because the two classes will technically be exactly the same (the types of the operand stack are represented in the name likes frame_IIL
(int, int, object)). On the other hand, I worry that this will create issues in two contexts: (1) the JVM may reject the class and create a ClassCastError because an internal manifest field may be different and (2) it may cause the class writing to skip writing the stack frame class to disk and when the files are copied to a new machine the stack frame may be missing. Thus, I am also debating using an UUID to name the stack frames since they will practically never collide with one another.
For some reason I cannot create a Long literal. This does not work:
l : long = 500L
The workaround for now is silly:
l : long = Integer(500)#longValue()
Right now, top-level code could create duplicated __function__1
definitions. I should consider how I am going to solve that problem.
A good test / use case for this is when you want to add a new function to an existing AOT compiled package. That would certainly create multiple ___function___1
definitions.
The following code (using the HTTP API) creates a compilation error.
try({
connection.readAll(c)
connection.writeAll(c, 200, null, "Hello, World!");
} catch(e : Exception) {
println("exception")
} finally {
println("Finally!")
})
error: java.lang.RuntimeException: java.lang.VerifyError: (class: handler, method: invoke signature: (Lsilo/lang/ExecutionContext;Lsilo/net/http/connection/Connection;)Ljava/lang/Object;) Inconsistent stack height 3 != 2
I wrote the following code:
if(...) {
}
(Note the braces are NOT instead the parens).
This lead to a weird cryptic error message that was hard to track down.
Support string concatenation using the plus operator. Example:
s : String = "Hello" + "World!" + 9
Reading the online documentation, it seems as if anonymous classes load and unload faster. I am not sure if this will lead to any noticeable performance improvements, but it is something to look into --- especially for custom stack frame classes.
https://wikis.oracle.com/display/HotSpotInternals/PerformanceTechniques --- Look near the bottom under "Miscellanous"
https://blogs.oracle.com/jrose/entry/anonymous_classes_in_the_vm
Profile the compiler and figure out any bottlenecks. In particular, why does core.silo
take so long to compile?
One of my current guesses is that Node#getChildren()
is really slow since it has to copy everything and is called a bunch of times. Once replaced with persistent vectors, it could be much faster.
Should I allow functions to be overloaded?
Make the following code give a more descriptive error message:
c : Class = someExpressionReturningObject
Add a simple test that ensures that void returns from Java don't mess everything up.
map : HashMap = HashMap()
fn(h : HashMap, h#clear())(map)
Also
// Should be a type error
i : int = map#clear()
Also
// 'i' should be null
i : Object = map#clear()
If I want access to the File operation commands I cannot do:
import(silo.io.file)
file.path("foo", "bar")
What I want to be able to do is something like:
include(silo.io.file)
Alias will also not work.
Right now, the runtime has no hook to shutdown the ExecutorService. Does this happen automatically?
Support the full gamut of operators with String objects. For example, all of the following should be valid and performed using String#compareTo
.
a : String = ...
b : String = ...
a < b
a > b
a < = b
a >= b
Most operators are implemented using right associativity when they should be left. The #
operator is especially troublesome and should be updated sooner than later.
It would be nice to be able to scope imports and alias to a package block. So, for example:
package(silo.net.http.connection {
import(silo.net.http)
server.createServer(...) // Okay
})
server.createServer(...) // [server] not found because the import is no longer active.
Augment the runtime with the following fields and methods
Runtime
- ExecutorService taskPool <-- Fixed
- ExecutorService backgroundTaskPool <-- Cached
- ConcurrentHashMap<String, Actor> actors
- HashMap<String, Integer> pendingActors;
- sendMessage()
- scheduleActor()
- unscheduleActor()
Actor
- String address
- ArrayList mailbox <-- All access is synchronized
- ExecutionContext context
Scheduling Algorithm
The scheduleActor()
method is completely synchronized. When it is called, it checks pendingActors
to see if the actor is currently pending. If the actor is not in the HashMap then the actor will be sent to the taskPool
. If the actor is there, the number is incremented.
After the taskPool
executes an actor it called unscheduleActor
which is synchronized on the same lock as scheduleActor
. unscheduleActor
will look into pendingActors and confirm that the number is zero. If the number is zero, then it will remove the actor from the map and return. If the number is not-zero, it will set the value to zero and immediately re-execute the task.
Basically, this algorithm gives the actor "one more chance" to execute before unscheduling it. It is somewhat likely that the actor will be resumed twice (which is okay because resuming a fiber is really fast) but it is very unlikely that it will be resumed more than that because the second time will likely start up, check that there is nothing in the mailbox, and immediately quit again, all very quickly. Thus, the window for another thread to schedule the actor is really small.
This two-way checking is needed to ensure avoid a race condition in which an actor is scheduled but is never actually sent to the taskPool
. This becomes a problem since we want to avoid an actor from being executed concurrently by two threads.
Attempts to connect to a non-existent host (for example, on a port that is not being listened on) causes the client to hang
Code blocks {...}
currently compile to do(...)
. Instead, they should compile to a node with a null
label.
I need to provide some capabilities to allow Silo code to create and extend Java classes and interfaces.
In the case of classes (which is the more general case) I need syntax to:
Importantly, the following are non-goals and do not need to be supported:
final
semanticsclass(Reader(Object, Comparable, List) {
field(foo : int = 5)
field(public bar : int = 5)
init(this : Reader, foo : int {
this.foo = foo
})
method(public toString(this => String) {
"Foo is: " + this.foo
})
})
Adding an ignore
macro could be interesting.
f : File = null
ignore(FileNotFoundException {
f = File.open("foo.txt")
})
If either a math or relational operation is performed on a wrapper class (Integer, Double, etc.) the operands should be unboxed and the operation should proceed as normal.
In the case of a math operation, the outcome should be boxed up once again at the end.
If you have methods:
void foo(CharSequence, Object) // 1
void foo(String, Object) // 2
The current compiler cannot determine that you should pick the second because String is more specific. The issue is inside Invoke#resolveFunctionByArguments
on the following line:
if(java.util.Arrays.equals(parameters[options.get(index)], parameters[options.get(i)]))
I need to ensure that a Fiber cannot ever be executed by multiple threads concurrently. I don't know exactly how I can do this. This can happen if you were to send a Fiber to another Actor (which I want to support). Perhaps Fibers can be made to be immutable?
Running the following code:
l : long = 5
i : Integer = Integer(l)
is obviously wrong, but the error message is strange and cryptic.
Right now Connection
has headers but it does not expose trailing headers. What is the best way to support that in the future?
Example code can be seen here:
It should be:
instanceof(o, java.util.Vector)
and not:
instanceof(java.util.Vector, o)
that way you can pipe them to one another:
o | instanceof(java.util.Vector)
Similar to the Python SimpleHTTPServer. Basically, you run it on the command line and it serves the current directory that you are in.
There are a couple of things that go along with this. The first is the notion of system-wide installs for 3rd party tools. This could be similar to NPM or Gem. I need to figure this out because this tool should NOT be part of the Silo standard library.
Second, it forces me to figure out the HTTP pipeline for the Silo core library.
Start building out the standard IO libraries.
Support ==
and !=
operators for non-objects. Note that != null
should be treated special with the is null
JVM opcode.
In HttpServerHandler when I enable this:
this.actor = runtime.spawn(connection.actorId, handle, handler, connection)
the Apache Bench starts to crap out on connection. I am not sure why. I am starting to expect that it could be calling the function dynamically with Function#apply
.
Add a test case that creates a fiber that calls a functions that calls an anonymous function before yielding. Something like this should be good:
func(foo() {
f : Function = fn(return(fiber.yield()))
f()
})
fiber : fiber.Fiber = fiber.Fiber(foo)
fiber.resume(fiber)
Functions that are defined in a finally block will be duplicated during compilation since the finally block essentially copies code into different places. This currently leads to an error message that is hard to decipher. I should make it more straight forward.
Create a special forms for quote
and syntax quote
.
Create an API that allows nodes to be matched against a pattern. Look to regular expressions, Haskell, Caml for inspiration.
In particular, Node make heavy use of java.util.Vector that needs to be replaced
Add a safety check to see if a fiber cannot be resumed and is "dead". This means that the function that the fiber was calling returns naturally with an ExecutionContext
status of RUNNING
. This fiber should NOT be allowed to be resumed as weird things could happen.
As seen here:
How should I support this?
If a function does not perform any blocking operations that would cause the current coroutine to be switched out, I can execute that function as a normal Java method instead of including the ceremony of beginCall()
, endCall()
, stack store, and stack restore. Practically speaking this will likely not improve performance but it will reduce code size.
All functions are considered to be blocking unless specifically told that they are not. Thus, NO_BLOCK is opt-in. Certain library calls like vector.create(...)
will be marked as NO_BLOCK because it just does computation. If a function
Create async.spawn
. Note that you do not need to do anything else with this approach to Async because it will have runtime support and no other functions are needed. I will use async.spawn (instead of actor.spawn) and then the normal actor API (read, receive, etc.)
I had a file that looked like:
options : silo.lang.FooBarBaz = silo.lang.FooBarBaz()
When run with silo file.silo
no exception was reported that "FooBarBaz" does not exist.
The Runtime.actorExecutor
service is a fixed size thread pool. It would be nice to be be able to change the size of the thread pool dynamically - is that something that can be done? A key use case is if an actor wants to "lock" the underlying Java thread so that it executes the actor and no other actor. However, you would also want to replace this thread with another one so it does not block the entire system. Is that something could be done?
There are a couple of areas where I need to be mindful of the transition:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.