shelnutt2 / crunch Goto Github PK

View Code? Open in Web Editor NEW

3.0 2.0 0.0 442 KB

MariaDB storage engine based on cap'n proto storage

License: GNU Lesser General Public License v3.0

CMake 9.46% C++ 89.97% Shell 0.49% Cap'n Proto 0.07%

mariadb capnproto capnp mysql mariadb-storage-engine

crunch's People

Contributors

Stargazers

Watchers

crunch's Issues

Add support for XA transactions

Initial transaction support was added in #40 . This issue is to add the XA transaction support:

https://mariadb.com/kb/en/library/xa-transactions/

Add daemon plugin

Add support for a daemon plugin which can manipulate a table in order to perform maintenance operations. Initially this will be limited to file consolidation.

Two strategies that need to be examined are having the daemon use the handler interface to interact with the table structure, similar to handler_socket plugin. The second strategy is to link directly to ha_crunch library and manipulate the table outside of mariadb handler structure. The main issue with the second method is locks still need to be acquired in order to safely consolidate the files.

Support unsigned integers properly

Right now unsigned integers are not supported. This is simple because we are only checking the field type in getCapnpTypeFromField and build_row. We need to check if unsigned bit is set and create schema to store appropriately.

Add support for storing blobs in blob type

Add support for storing blobs in blob type of capnp proto.

Add support for online table alters

The idea is to add support for online table alters by creating new version of the schema. We can then produce a migration of data on read. This would put the work on reads instead of causing a table rebuild when a column is changed. During file consolidation older data files can be upgraded to the latest version of the schema.

If column order is what is being changed nothing has to be done. Even if ondisk storage and capnp proto schema is in different order, the fields are set using the field interface, so no changes are needed.

If column name changes, then we can just create a new capn proto schema with the new names, but as long as order and data types are the same, then the data on disk doesn't need to change.

Changes are needed for:

ADD_COLUMN
ALTER_COLUMN_DEFAULT
~~ALTER_COLUMN_NULLABLE~~
ALTER_COLUMN_FORMAT
ALTER_COLUMN_STORAGE_TYPE

Implement:

check_if_supported_inplace_alter (TABLE *altered_table, Alter_inplace_info *ha_alter_info)
prepare_inplace_alter_table(TABLE *altered_table, Alter_inplace_info *ha_alter_info);
inplace_alter_table(TABLE *altered_table, Alter_inplace_info *ha_alter_info)
commit_inplace_alter_table(TABLE *altered_table, Alter_inplace_info *ha_alter_info, bool commit);
notify_table_changed();

Add support for Updates

Add support for Updates.

~~Updates can be done in place by creating the new message and just memcpy over the old record.~~

New design:

Updates are deletes and a insert. See #18

This needs #13

Add support for year datatype

Add order by for tests to work around travis ci

Add order by for tests to work around travis ci failures. Sometimes the resulting transaction files are ending in a different order in the travis ci vm. I am unable to reproduce this locally, in docker or in kvms. For now to prevent the false test failures, we will add order bys to test with multiple results.

Properly set engine table flags

Convert column names to capnp format

Cap'n proto does not support underscore in field names and enforces camel case. Need to convert column names which often have underscores to camel case.

Investigate flaky mediumint to bigint test

The alter table test with multiple inplace alters that changes a mediumint to a bigint is failing in travis ci. This failure can not be preproduced anywhere else but travis ci. It was disabled, but should be investigated.

25b4745

Sample failure: https://travis-ci.org/Shelnutt2/crunch/builds/343135301

exception on rnd_next ./test/t1: capnp/layout.c++, line: ../storage/crunch/src/crunch.cpp:346, exception_line: 2159, type: 0, e.what(): expected boundsCheck(segment, ptr, ref->structRef.wordSize()); Message contained out-of-bounds struct pointer.
2018-02-18 22:32:29 139966787450624 [ERROR] mysqld: Got error -44 "Internal error < 0 (Not system error)" from storage engine Crunch

Switch travis ci compilation to use ninja

Switch travis ci compilation to use ninja-build in order to reduce test time.

Use fswatch to monitor for when data/schema file changes

We should use fswatch for a cross platform file monitoring solution so we can mremap the data file when it changes, or reload the schema.

https://github.com/emcrisostomo/fswatch

This will need to be run from a separate thread and use a lock to prevent changes during active reads.

Switch CI tests to use docker image for build depenedcies

Switch CI tests to use docker image for build dependencies. This will prevent having to compile capnp proto for every test, which adds 5-10 minutes to the testing. We can also skip the ubuntu dependencies and increase testing to other gcc/clang version with little overhead.

Add transaction support

Add transaction support.

This involves storing transaction data for deletes and writes in new files. On commit of a transaction the files are closed and moved from a transactions working directory to the main table folder.

Need to clear out transaction folder on startup.

Add support for basic inserts

Move data files into data folder

We should move data files into data folder. This would make it easier to atomically consolidate files. Right now we rename/move all existing files into a "consolidate folder", then we rename the new data file from transactions to the main folder than we delete the consolidate folder. Its too make operations, and not atomic enough in case of failure.

What we want to do is create data_dirs and then just symlink from data_dir_X to data. This can be done atomically:
http://blog.moertel.com/posts/2005-08-22-how-to-change-symlinks-atomically.html

Add support for auto increment column

Records, currentRowNum need to be 64bit

Add advanced data/delete file support

Initial delete support was added in #18. Initial writes in #1.

This issue is to extend the data/delete support to support multiple files containing the data and deletes. The basic idea is anytime an update or delete statement is called, we will end writing new rows to the data file, start a new data file and move deletes to the new file.

Add cap'n proto as vendor submodule

Add cap'n proto as vendor submodule and CMake check for if system version is missing or not >= v 0.6.1

Handle crash recovery

By nature of the using write once files per transaction, we only have up to two atomic operations per table during transaction commits. One rename of the data file, one rename of the delete file.

From a single table perspective (simplest transaction), if a crash were to occur after only a single atomic operation were to complete, on start up we must rollback and delete this partial transaction. The transaction can be determined by comparing the data folder to the transactions folder. Since all transaction files have the same name and only differ in extension it is possible to find the partially committed transaction.

This must be done only on crash recovery as during normal operation it is easily possible to encounter this half committed transaction if a transaction is mid-flight.

Transactions can span across multiple tables, and in that case the same basic operation can occur. All tables get the same transaction id (epoch_nanosections+uuid). Thus checking across all tables for any partial commits is doable. This however only works if the transaction involved two or more crunch tables. If this is cross engine, we can not do this effectively without XA commits.

Add support for storing date, datetime and timestamp fields

Add support for storing date, datetime in struct fields.

Setting Default Value for Capnp Schema

Marko was able to show me how innodb sets the default value in their instant add column. They basically call ->set_default on the field, then they read the field value. We should be able to do the same.

Segfault from freeing pointer on closing table

Add support for geometry data types

Add support for geometry data types.

Add travis ci support

Add travis ci support for mariadb 10.2, linux and osx.

Add support for consolidating data and delete files

Add support for consolidating data and delete files by reading non-deleted rows into a consolidated file.

The approach to be discussed here is to perform partial table scans and consolidate in chunks.

Implement row level locking

Implement row level locking.

Add testing framework

Need to add support for a testing framework. We need the ability to add mysql tests for this storage engine.

myisam handles with with test executables, however I'd like to use the mysql testing platform, so tests can be run inside the database to ensure all functions are implemented correctly.

Speed up CI builds

CI builds can be speed up if we limit what we build with mariadb.

Support multiple charsets

Capn'proto expects all text to be in utf-8. Right now when we get blob or char/varchar data we assume it is in utf-8. Instead of assume, a conversion should be done.

Consolidation Locking Enhancements

Right now table locks are all that is supported. When #13 is implemented, we will have more fine grain control. This issue will happen after #13.

The basic idea is that we can used shared read only locks for everything but deletes. For deletes, if we want to implement the possibility of read only locks, we'd need to keep a mapping of old file, old position to new file, new position, so we can port the deletes that happen while consolidation is running. The commit phase always has to be an exclusive lock.

Add primary key support

Add support for handling of null fields

Add support for handling of null field

Improve handling of consolidation

Improve handling of consolidation:

Use symlink + rename to gain better atomic operation
Remove need for removeOldFiles

Implement index condition pushdown

See ha_maria::idx_cond_push

Replace stderr with my_error()/print_error() calls

Timestamp to Datetime conversions incorrect

Timestamp to Datetime conversions that happen in message upgrades are not correct. Timestamps are stored in UTC and mysql converts it to server timezone after select. During a normal offline alter table timestamps are converted to server timezone then written to the datetime field. We should do the same.

Add support for info()

For info() we can compute data file size by adding a new parameter to the data_struct that has the file size. We can then also maintain a class variable for total size. This can be updated on each run of findTableFiles.

Add support for non-sequential reads

Add support for non-sequential reads, rnd_pos and position

Add support for deletes

Delete support is needed. Deletes can be done in multiple ways.

Add a delete indicator to the record, this essentially would be a update with a hidden field.
0 out the entire message from the file, this means we have to handle gaps in messages when reading/scanning file.
Maintain a separate file with the list of deleted rows.

"1)" or "2) "are pretty equivalent. The advantage to 2 is during a table scan one does not have to parse the message only to find it has been deleted. It might also be that option 1 makes roll back easier. However with 1 or 2 we still need to maintain a list of the ongoing rows touched in the transactions.

"3)" Does not seem to have a large benefit. If we keep the rows separate, then we just have to read that into memory and still do a comparison. The only upside compared to 1, is we don't have to parse capnp proto message to see if it is deleted or not, we can store the file offset and skip that way.

With option 2) We can also have a daemon process that periodically reorganizes a table that is closed, so zero'ed out space that is not a message is removed, truncating the file size.

OPTIMIZE TABLE Should Run Consolidation

Add support for bit datatype

Advanced inplace alter table support

in #50 online (inplace) alter table support was added for column additions, name changes and dropping columns.

Changing column datatypes to non-compatible datatype (e.g. int to a string) requires a rebuild. This should be handled with online alters. The basic requirements are to create conversion process from one data type to another, i.e itoa, or atoi. Then on reads this will be converted on the fly. When a consolidation happens the data on disk should be updated so the conversion is no longer required.

Update documentation - Create docs folder

Update documentation - Create docs folder.

Move arch to (docs/Architecture.md)[docs/Architecture.md]
Add feature checklist to readme
Add limitations to readme
Add non-production disclosure to readme

Add support for spatial indexing

Add support for spatial indexing. Blocked by #29

Consider storing more field details in schema

Consider changing on disk cap'n proto schema so that each column is stored as a field struct. There is a lot of metadata we are not storing right now, such as nullability of a column, or default values. Default values could be stored if they are a constant but if they are an expression we must rely on mariadb for storing these values in the .frm file.

In order to do auto table discovery, we have to store everything needed in the table data (data files or schema files) ourselves.

The limitation of inplace later tables when adding a column with a default and nullable (and it always being null for existing rows) is a because we are not storing this information.

The downside here, is right now each row is stored simply, does it really make sense to pack all this extra information into each and every single row? It would great increase disk space, and processing of data that is constant (for the schema version).

Perhaps we introduce a new data file that goes along with the schema, and contains the metadata of the table? We keep the "rows" struct compact and simple, and we create new struct to present the table and all the metadata. The advantage this way is the data files stay compact, we only have to write "table metadata" once per schema change. The downside is in any single row data file you'd be missing the data required for logic, such as if a column has no value but uses a default expression. New struct/metadata files also increases the complexity. Right now storage is simple. The downside is virtual columns are not supported.

I'm leaning toward new struct and new data files. It does not make sense to expand the row structure so that every column is a field struct and contains all metadata, it'd be a massive amount of duplication, and we have no compression (yet)!

shelnutt2 / crunch Goto Github PK

crunch's People

Contributors

Stargazers

Watchers

crunch's Issues

Recommend Projects

Recommend Topics

Recommend Org