Giter VIP home page Giter VIP logo

xdb's People

Contributors

dependabot[bot] avatar shivam010 avatar sumukha-pk avatar tsatke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

codeglad bajajra

xdb's Issues

engine: example database test setup

Create an example database (something like a wine-database or a retail thingy, something with a bit of data in it), then create a few simple SQL queries against it so we can be sure our algorithms work.

We have xqueries/test_db, but this needs to be included in this project as test suite.

Depends on xqueries/test_db#1

xdb: implement public API

Implement Go API to directly interact with a database.

Should be usable similar to the following.

db, err := xdb.Open(myAferoDbFile)
db.CreateTable(xdb.Table{
    Name: "myTable",
    Columns: []xdb.CreateTableColumn{
        {Name: "col1", Type: xdb.TypeInteger},
        {Name: "col2", Type: xdb.TypeString},
    },
})
  • open / create ( #39 )
  • create table ( #38 )
  • select ( #40 )
  • insert ( #41 )
  • delete ( #42 )
  • drop ( #43 )

engine: cleanly resolve literals to values from intermediate table

Currently, when evaluation a selection, the engine checks every col of the intermediate table for a col with the name of the left and right side of the selection filter expression, at least if either of them is a string literal.
This should be done cleanly in evaluateLiteral, with the intermediate table stored in the context, so that evaluateLiteral can refer to that table and either return a value from that table, or a string or numeric value representing the given expression. That way, we don't have to resolve literals to table values in each place this is needed, but we can do it centralized.

cluster: race condition in raft/cluster

Describe the bug
The raft/cluster package has a message queue channel which passes on the incoming messages through it like this:

c.messages <- incomingPayload{
			origin:  conn,
			payload: data,
		}

This channel when closed asynchronously from its parent function leads to a race condition.

To Reproduce
Reproduced while cluster.Close() is called, in the raft branch.

Expected behavior
Race conditions must not exist, ever!

Screenshots
Screenshot 2020-10-26 at 4 52 11 PM

network: avoid repeated closing of already closed connection

As described in #53 , connections are being closed multiple times. A workaround is, to check while closing, whether the connection is already closed, and then performing a no-op.

With this issue, the reason for multiple calls to Close from different locations or even the same location should be found and eliminated, so that connections are closed exactly once.

engine: implement Serializer for all types

Currently, only a few types implement the Serializer interface. Implement the Serializer interface for all types that require it. These include:

  • DateType
  • IntegerType
  • RealType

FunctionType should not be serializable (for now).

For an example on how to implement the Serializer interface, have a look at BoolType or StringType.

engine: implement engine

Implement an engine that can execute our IR.

Scope:

  • selection support ( #14 )
    • fix literal evaluation ( #13 )
  • define file format ( #3 )
  • file system access ( #16 )
  • full table scan ( #17 )
  • e2e tests ( #18 )
  • db configuration ( #20 )
  • support for PCTFREE and overflow pages ( #19 )
  • introduce transaction manager ( #21 )
  • profiling ( #23 )
  • builtins ( #2 )
  • add documentation documents (xdbdoc)

inspect: table debug

Print tables as so:

 table TableName
+---------+----------+
|    Col1 |  Col2    |
+---------+----------+
|         |          |

If this table doesnt exist, an error is returned. Else, the table is printed row wise.

compiler: literal handling

See #55

For the AST, I'd like two new interfaces and three new structs.

type (
    ColumnReferenceExpression interface { ... }
    LiteralExpression         interface { ... }

    ConstantLiteral struct {}                  // implements LiteralExpression
    ConstantLiteralOrColumnReference struct {} // implements both LiteralExpression and ColumnReferenceExpression
    ColumnReference struct {}                  // implements ColumnReferenceExpression
)

I expect that this facilitates handling in the engine a lot.
Whoever implements this; feel free to change the names suggested above. However, the names should be somewhat consistent with what we already have.

engine: reference to alias in WHERE clause doesn't work

Describe the bug
When referencing a column alias in a WHERE clause, the respective column can't be found.

To Reproduce
See TestExample07 and remove t.Skip(), then run it.

Expected behavior
No error.

Additional context
This might be a compiler issue, however, not sure if we should invert selection and projection just to make this happen. We should first check whether the column information is available in the where clause, and if so, make use of it and resolve the correct column.

engine: full table scan

Implement full table scans. The function stub already exists, is called scanSimpleTable in the file scan.go and currently returns a not-implemented error.

inspect: interactive shell for the inspection tool

This is branched off to a new sub-project -ishell. Refer org.

General requirements:

The inspector should provide an interactive shell. When starting the inspector, a file must be provided, and the inspector operates on that file, until the process is terminated. Ctrl + C shall terminate the process cleanly.

parser: create table allows incorrect production

Describe the bug
CREATE TABLE(n,FOREIGN KEY()REFERENCES n ON DELETE CASCADE) can be parsed, and has no table name.
I don't know if the statement is otherwise incorrect, I do assume so.
However, this passes the parser and causes a nil dereference in the compiler.

Expected behavior
A parse error.

compiler: optimize constant expressions

Optimize constant expressions, so that the executor does not have to evaluate expressions like these.

  • 1 == 2 => false
  • false == true => false
  • a IS b where a and b are different constant values => false
  • 1 BETWEEN 5 AND 6 => false
  • etc.

There is a lot of room for creativity here.
Please take note, that optimizations require extensive testing, so that nothing is optimized in a wrong way.
This means, that to complete this ticket, a lot of positive and negative test cases are required.

engine: database-specific configuration

Implement a configuration per-database. Ideally, this is included in the database file, maybe as separate table with a reserved name.

This reserved name must actually be reserved, so there need to be restrictions on CREATE statements.
Maybe this can be merged with some kind of master table, which contains meta information about the database, such as creation time of tables etc.

I imagine it to be usable kind of like this:

SELECT * FROM master WHERE key = 'PCTFREE';
+---------+-------+
| key     | value |
+---------+-------+
| PCTFREE | 60    |
+---------+-------+

literal handling

We must differentiate between a constant literal and a column reference in a projection.

The following changes should be implemented.
There is a difference between

  1. SELECT name
  2. SELECT "name"
  3. SELECT 'name'
  • For 1, name is a column reference, and must be resolved to a column of the input of the projection. If no such column exists, an error is returned.
  • For 2, "name" is a constant literal or column reference, meaning that if there exists a column name in the input of the projection, it is resolved to that column. If not, it is interpreted as a constant literal.
  • For 3, 'name' is a constant literal, which in this case is of type String.

TODO:

  • adapt parser ( #56 )
  • adapt parse tree ( #56 )
  • adapt AST ( #57 )
  • adapt compiler ( #57 )
  • adapt engine ( #58 )

engine: fine profiling

Implement a more fine-tunable and more granular profiling than the basic one we have right now.

The new implementation must either replace or extend the current one.

parser: broken select statements

Describe the bug
I see some similar bugs in https://github.com/xqueries/xdb/blob/master/internal/parser/parser_test.go

"CREATE TABLE myTable AS SELECT *",
"INSERT INTO myTable SELECT *",

I am not sure whether they are intentional or indeed are missed somehow, but I think these statements and others like these should give an error related to an invalid SELECT statement or missing table name.
Even, the select query without any FROM clause and Literal, is passing.

To Reproduce
Run the tests.
Also, try SELECT *, which I think should fail.

Similar issue
Missing table name in Create statement #29

engine: implement builtin functions

The most basic builtin functions should be implemented. These are the following.

  • RANDOM
  • COUNT
  • UCASE
  • LCASE
  • NOW
  • MIN
  • MAX

Every single one needs extensive testing, as well as benchmarks (no optimization, just benchmarks).
They also need very detailed documentation on how they work and what they do exactly.

For a builtin function to be considered "implemented", there needs to be an e2e test in internal/test (in addition to required unit tests).

engine: transaction manager

Implement a database transaction manager.

I imagine the internal API to be similar to this.

func (e Engine) createTable(tableDef TableDef) (err error) {
    tx, err := e.BeginTransaction()
    if err != nil { return err }
    if err := tx.createTable(tableDef); err != nil {
        return err
    }
    defer e.FinalizeTransaction(tx, err)
}
...
func (e Engine) FinalizeTransaction(tx Transaction, err error) {
    if tx == nil { return }
    if err != nil {
        tx.Rollback()
    }
    tx.Commit()
}

Implications are, that proably all resources can't be loaded through the page cache, but have to go through an additional layer of the transaction manager, which can convert the pages to copy-on-write and mergeable pages to allow for rollbacks and commits respectively.

make: makefile should go get linters

The Makefile currently does not go get the linters for the lint step. This should be done. However, doing so automatically could prove difficult if we consider different versions and other executables that may have the same name.

Because of that, a new step make deps should be introduced. This step must do the following.

  • go mod download
  • go mod verify
  • for every tool, such as golint, errcheck etc.
    • go get <the-tool>

While we're at it, we should replace the go test in make test with gotestsum. Race detector must still be enabled.
This is due to two reasons.

  1. the output of gotestsum has better readability for humans
  2. CI doesn't use the make file and thus doesn't need the output of make test

compiler: select returns nothing to select from

When executing a statement like SELECT "foobar" as "info", the engine returns an error nothing to select from.

It should, however, return a table as follows.

+--------+
|   info |
+--------+
| foobar |
+--------+

raft: implement raft protocol

Implement raft protocol communication as well as test tools for a cluster.

  • raft functionality
    • leader Election
    • append Entries
    • heartbeats
    • request Votes
  • test framework
    • Add functions that validate correctness
      • Check whether node has correct parameters after stop or start.
    • Shut down leader/follower functions.
    • Disable logs from the consensus(make them flag specific) and have logs from the framework indicating what's happening.
  • tests
    • normal raft operation
    • leader failure
    • leader recovery
    • multiple leader problem
    • network partitions and their recovery
    • follower join
    • follower graceful shutdown
    • mock tests where the non-mocked node is a leader
    • mock tests where the non-mocked node is a follower

inspect: page debug

As a user of the inspector, I want to be able to access every page in the database file by its ID, and I want to be able to see its contents. That is, I want to see how many cells there are on the page, how much free space there is, whether or not there is an overflow page and maybe even more.

>>> page 9
ID: 9
Cells: 12
Free: 62/64KiB
Overflow: no

I would like to see things like boolean attributes as yes and no only, and have them colored respectively (yes in green, no in red).

The page command must be able to be stateful and respond to specific commands. A typical use case is shown below:

$ xdb inspect myFile.db
> pages
[0, 1, 2, 3, 4, 5, 6, 9, 11, 12, 16]
> page 5
1522 cells
overflow: no
page 5> somecellkey            # key of a cell, this should also have autocompletion
type=data
key=somecellkey
value=somecelldata             # if this is row-data, it would be nice if it was formatted accordingly
page 5> othercell
type=page
key=othercell
page=9
page 5> k
k
> page 9
page 9>
  • page x to provide data about page x.
  • scope switching both up and down (up when a page is queries and down when k (our exit) is entered)
  • cell details inside page scope

tool: create tool for debugging a database file

A database file currently is not readable.
To manually inspect the contents of such a file, we want to have a tool to use, which displays information about the file.
We want this tool to be usable as a sub-command of the xdb command, such as xdb debug.

The developer should figure out, which properties are most important. He should also figure out, which sub-commands to the debug command would make sense to have.
As an example, here are some example commands and outputs.

$ xdb debug file.db
Summary of file.db (size 256KiB)
  Pages:  4
  Tables: 1
  Unused: 255KiB
$ xdb debug tables file.db
Summary of tables in file.db
  Tables: 1
Summary of 'myTable'
  Schema:
    CREATE TABLE myTable (
      id CHAR,
      created DATE
    )
  Records: 1

and many more.

inspect: overview command

Inside the interactive shell, there should be an overview over the most important things in the table, such as the following.

  • database size
  • amount of tables
  • amount of indexes (since we don't support that yet, this would always be 0)
  • amount of pages

driver: implement database comm endpoint

The database needs to expose an endpoint where it can receive client (driver) queries.
The communication needs to be defined and must explicitly support transferring tables.

update doc/overview.md

That document is outdated and incorrect.
We must change that before the first release, and add a detailed explanation on components.

compiler: optimizations

Implement additional compiler optimizations.

For trickier compiler optimizations, add documentation in xdbdoc, with mathematical proof that the optimization works and is equivalent to the non-optimized query.

inspect: The xdb file inspection tool

A database file currently is not readable.
To manually inspect the contents of such a file, we want to have a tool to use, which displays information about the file.

The developer should figure out, which properties are most important. He should also figure out, which sub-commands to the inspect command would make sense to have.

xdb inspect will be a tool that can inspect the database file (.xdb)

This can be started by xdb inspect dbFile.xdb, which should start an interactive CLI that enables a multitude of explorations of the database file. This tools is however only a read only tool that can only read about what the database file has. No modification operations are permitted for the sake of safety and simplicity.

General requirements of the tool:

  1. The tool must be able to read a DB file and provide the details available via the file in a human friendly format.
  2. The CLI is interactive for some commands which enables the CLI to be stateful. Find in depth explanation with specific commands that support this.
  3. The commands that are supported will be in the scope of the data available in the DB file and the supporting engine implementation and will be explained in further requirements.
  4. Auto-completion - auto completion for table names on Tab (or as supported by the CLI library).
  5. Command - Help: Gives a basic explanation on how to use the CLI and on specific commands too.
  6. Command - Overview: Gives a basic idea of the entire file; data on how much space is used, how many tables exist etc - detail in ticket.
  7. Command - page: Usage - page numberOfPage. Gives cell level data on the page, detail in ticket.
  8. Command - table: Usage - table TableName. Display table data as provided by engine.
    Package location:
    The inspector should reside in cmd/xdb/inspect.go as a new cobra command.

The actual inspector implementation should reside in internal/inspector. We can think about moving it to public API if developers request that, however, right now that would only increase the development efforts that have to be taken.

Sample intended output:

$ xdb inspect file.db
Summary of file.db (size 256KiB)
  Pages:  4
  Tables: 1
  Unused: 255KiB
$ xdb inspect tables file.db
Summary of tables in file.db
  Tables: 1
Summary of 'myTable'
  Schema:
    CREATE TABLE myTable (
      id CHAR,
      created DATE
    )
  Records: 1

and many more.

Commands supported:

  • help - gives out details about what the Inspector can do.
  • overview - displays the space occupied, the number of tables and other details about the db.
  • table TableName - reads the table details.
  • page pageID - outputs all the captured details of the page in details.

Supporting issues:

driver: implement driver

The driver needs to be implemented, as well as the communication interface to the database.

  • driver logic ( #34 )
  • driver-database communication ( #33 )

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.