ashwin153 / caustic Goto Github PK

View Code? Open in Web Editor NEW

34.0 34.0 3.0 6.41 MB

A transactional programming language.

Home Page: https://madavan.me/projects/caustic.html

License: Apache License 2.0

Shell 3.32% Python 3.97% Scala 88.88% ANTLR 3.23% Dockerfile 0.60%

caustic compiler runtime

caustic's People

Contributors

Stargazers

Watchers

Forkers

hyqgod hbcbh1999 iforgotband

caustic's Issues

RPC Interface

In order to do #6, a rich cross-language interface needs to be put in front of the transaction execution engine.

Update Syntax Package Documentation

Syntax package documentation is very inadequate.

Automate Release Process

Write a bash script to simplify the release of build artifacts to Maven Central.

Add Continuous Integration

Hook up the project to Travis CI to make sure we aren't breaking any tests on push and deploy.

Cons Syntax is Verbose

Chaining multiple commands together in a cons is extremely verbose.

def block(a: Transaction, b: Transaction, rest: Transaction*): Transaction =
  rest.foldLeft(cons(a, b))((x, y) => cons(x, y))

This let's you say block(a, b, c, d) instead of cons(cons(a, b), cons(c, d)).

Add in an Map type into the Caustic language. This Map type would be an implementation of a B+ Tree, that would allow efficient bulk lookups of record attributes. For example, you could find all records where a certain field was less than or greater than a particular value.

Other collection types like List and Set can be implemented using a Map. For example, a List is a Map with integer keys and a Set is a Map from keys to themselves.

Prefetch Limit Number Returned

In order to deal with large collections, implement a maximum number of keys prefetched at a time feature.

PostgresDatabaseTest Race Condition

This build failure, is due to unit tests being run before the Postgres table is created. Add a latch to the unit tests so that they only run after setup is complete.

Move Service to Separate Package

There's too much namespace overlap.

Update Wiki Documentation

GitHub wiki pages are extremely incomplete and out-of-date.

Macros

Insert runtime transactions directly into caustic programs

Additional String Operations

Add support for indexOf, toUpperCase, toLowerCase, and trim.

Syntax Redesign

Object

$fields
$reference

Dynamic Typing

x.foo.bar
x.foo is a reference to x.foo.bar
x.foo is a field
x.foo("bar") is the foo$bar field of x
x.foo("bar")("car") is the foo$bar$car field of x
x.foo("bar", "car") = x.foo("bar")("car")
Foreach (ctx.i, x.foob) iterates over th fields of x that are prefixed by foob
Delete(x) removes all the fields of x and propagates to all the references of x
Stitch(x) returns a json object containing all fields of x.

Static Typing

x('foo) x.foo
x('foo)('bar) = x.foo.bar

Import Resolution

Right now it is impossible to link together Caustic files. We'll need to add some dependency resolution logic into the compiler.

Caustic REPL

Create a website where you can write out transactions and run them on a database directly in the browser. Useful way to learn web development stuff, and a great tool for people to test out code. In practice, this will work just like the Caustic syntax unit tests.

Docker Integration

The runtime is executable, but it requires the system to be "just so" in order to function properly. Make the executable configuration file-based instead of programmatic, and get it to run within a docker container. This will make it really easy to spin up server instances.

Add Code Coverage

Lots of great tools out there, but might be challenging to integrate them with pants. The pants build repository is using coveralls and the foursquare open source project is using codecov.

Additional Cache Integrations

Add support for Memcached, Redis, and Elasticache.

Add None Type

The caustic-syntax package is desperately in need of a None type. There is currently no way to distinguish between no value and the empty string.

Move Syntax to Separate Module

Remove Dependency on Akka

Users shouldn't have to depend on all of Akka just to schedule retries. Also, the syntax for creating an implicit ActorSystem is clunky. Much preferable to schedule TimerTasks on the underlying ExecutionContext.

Add Scrooge Support

In order to generate Thrift Scala code for #24, we must first add support for Scrooge.

`./pants.ini`

Scrooge requires a special pants plugin.

plugins: [
    'pantsbuild.pants.contrib.scrooge==%(pants_version)s',
]

`./BUILD.tools`

Scrooge requires a special generator and linter library.

jar_library(name = 'scrooge-gen',
    jars = [
        jar(org='com.twitter', name='scrooge-generator_2.12', rev='4.18.0', excludes=[
            exclude(org='org.apache.thrift', name='libthrift')
         ])
    ],
    dependencies=[
        '3rdparty:thrift-0.6.1',
    ]
)

jar_library(name = 'scrooge-linter',
    jars = [
        jar(org='com.twitter', name='scrooge-linter_2.12', rev='4.18.0', excludes=[
            exclude(org='org.apache.thrift', name='libthrift')
         ])
    ],
    dependencies=[
        '3rdparty:thrift-0.6.1',
    ]
)

`./3rdparty/jvm/BUILD`

Scrooge requires version 0.6.1 of Thrift.

jar_library(name='thrift-0.6.1', jars = [
    jar(org='org.apache.thrift', name='libthrift', rev='0.6.1')
])

`./caustic-runtime/src/main/thrift/BUILD`

java_thrift_library(
  name='scala',
  compiler='scrooge',
  language='scala',
  sources=rglobs('*.thrift'),
)

Call out higher up in the readme whether JVM is a build-time or run-time dependency

Build Documentation

Incremental Compilation

Memoize Goal execution to avoid re-running it on the same source file.

Caustic to SQL Converter

Write a program that converts Schema transactions into runnable ANSI SQL. Obviously, certain Schema transactions are extremely difficult or impossible to express in ANSI SQL. This is a useful way to compare Schema and SQL both for marketing and for debugging.

class Transaction {
  def toSQL: String = this match {
    case Read(k) =>
      s"""SELECT value WHERE key = "$k.toSQL""""
    case Write(k, v) =>
      s"""INSERT INTO table (key, value) VALUES ($k.toSQL, "$v.toSQL") ON DUPLICATE KEY UPDATE  key = "$k.toSQL" and value = "$v.toSQL""""
    case Literal(x) =>
      x.toString
    case Cons(x, y) =>
      s"""$x.toSQL; $.y.toSQL"""
    case Add(x, y) =>
      s"""$x + $y"""
    ...
    case default =>
      throw new UnsupportedOperationException("$default is inexpressible in SQL")
  }
}

Some operations like repeat, prefetch, load, store, and branch are inexpressible in SQL. For these operations, an exception will be thrown to indicate that the transaction cannot be parsed.

Fix Compiler Warnings

Unused dependencies, and adopting arguments.

Additional Language Bindings

Add support for Python and possibly C++, Rust, and/or JavaScript.

Linearizable Transactions

https://www.cockroachlabs.com/blog/serializable-lockless-distributed-isolation-cockroachdb/

Benchmark and Stress Test

Before Schema can be safely used in production, we need to test its performance under various workloads (high read, high write, high contention, etc.). Performance is generally the main concern people have with this project. Benchmarks should be a top priority to convince people about the viability of the project.

Transaction throughput
Transaction latency
Allocations
Database size

YCSB Benchmarking

Implement benchmarks for YCSB.

Build File Style

The ./pants.ini file is getting way too complex. I think it'll be much easier to find issues if the different sections were alphabetized. Furthermore, we should adhere to Pants conventions like naming it publish.ivysettings.xml instead of ivy-publish.xml.

Service Discovery

Make it easier to bootstrap a Database server and simplify the process of connecting clients and executing transactions. Implement service discovery to allow servers to automatically register themselves and clients to automatically discover them. Service discovery can be easily implemented on top of ZooKeeper using standard Curator recipes.

Project Rename

Schema is really difficult to search for and is too prevalent in the database world.

Deletion Should Propagate to Indices

Suppose an object is referenced from an index. When the object is deleted, it should be removed from the index. This will make it so that you can store collections of objects and iterate over them. We'll associate with each object a list of indices that it is a member of.

Recursion

Caustic Programming Language

A full fledged programming language, that compiles into thrift/http/protobuf interfaces and implementations with configurable storage engines that is served when run.

record NameOfRecord {
 bool x,
 double y,
 string z,
 NameOfOtherRecord bar,
 NameOfRecord& car,
}

service NameOfService { 
  def getAllUsers(foo: NameOfRecord): NameOfRecord  {
    if (foo.x) {
      val i = 0
      while (i < foo.y) {
        foo.bar.z = "3"
        foo.car.z = "3"
        i++
      }
    }

    return foo;
  }

}

Integrated Caching

Right now the TransactionalDatabase creates a new snapshot each time it executes a transaction. Instead, this snapshot should be an LRU cache that is invalidated on unsuccessful writes.

Add Scan to Database

If def scan(prefix: Optional[String]): Iterator[String] is added to Database, then you can do efficient filtration queries on keys. For example, SQL like queries of the form x LIKE 'abc%q' would translate to scan(r'abc').filter(_.matches('abc.*q')). You would also be to iterate over the key-space using scan(), which could be useful for implementing backfill/migration services on top of a Database.

Compiler Error Messages

Error messages are abysmal. Use ANTLR Error Nodes to propagate errors.

Null Handling

Generic Conditional Put Implementation

Rather than rely on databases to provide a conditional put implementation, which can be difficult to implement in distributed databases, Caustic will provide a generic implementation.

Adaptive Placement

There are two kinds of conditional put operations, or transactions, that may be performed on a distributed database: distributed and local. Distributed transactions span multiple shards and require an expensive coordination operation to guarantee that transactions atomically commit or abort. Local transactions occur only on a single shard and do not require any coordination. Local transactions are clearly more efficient than distributed transactions. Therefore, the library should perform adaptive placement. Whenever a distributed transaction is performed, the library may decide to colocate keys so that future transactions will be local.

Links

http://people.csail.mit.edu/idish/ftp/JCSS.pdf
http://rystsov.info/2012/09/01/cas.html
https://arxiv.org/pdf/1509.07815.pdf
https://www.cockroachlabs.com/blog/how-cockroachdb-distributes-atomic-transactions/

Ammonite REPL

Additional Database Integrations

Add support for Cassandra, RocksDB, CockroachDB, DynamoDB, and SQL Server.

Automatic Retries

Use the Backoff implementation in caustic-common to retry transaction conflicts. Implementation should not retry failures due to invalid transactions, etc. Furthermore, the backoff durations should be client specified, but server implemented. In other words, clients tell the server to execute a transaction with x backoff durations and the server actually performs the retried execution. If both things happened on the server, then the backoff durations would be fixed after the server starts. If both things happened on the client, then each retry would require an additional network round-trip and you would have to pay the cost of transaction serialization and parsing on each attempt.

Database Thread-Safety Test Flaps

When running the database thread-safety test on MySQL and PostgreSQL the test sometimes passes and sometimes doesn't.

Database Execution Causes StackOverflowException

Because the database execution logic is implemented using head recursion, it causes a StackOverflowException for large transactions.

Richer Literal Types

Right now all literals are of type String, this makes it very difficult for the transaction execution engine to provide (1) meaningful return values, (2) descriptive error messages, and (3) important functionality (appending to lists, etc.). It also makes the execution engine do unintuitive things (previously, adding True + True == 2 because True == 1) and complicated the evaluation of a transaction (previously, the prefetch operator took two steps to reduce because it was first translated to a list of read operators).

I propose the following base types (more complicated types can be constructed from them):

Boolean -> bool
Number -> double
Text -> string
Sequence -> array

Caustic Pants Plugin

Integrate the Caustic compiler with Pants to make it easier to compile and run Caustic programs.

ashwin153 / caustic Goto Github PK

caustic's People

Contributors

Stargazers

Watchers

Forkers

caustic's Issues

./pants.ini

./BUILD.tools

./3rdparty/jvm/BUILD

./caustic-runtime/src/main/thrift/BUILD

Adaptive Placement

Links

Recommend Projects

Recommend Topics

Recommend Org

`./pants.ini`

`./BUILD.tools`

`./3rdparty/jvm/BUILD`

`./caustic-runtime/src/main/thrift/BUILD`