Giter VIP home page Giter VIP logo

tesseract's Introduction

Tesseract: Online Schema Evolution is (almost) Free for Snapshot Databases

This repository implements Tesseract. See details in our VLDB 2023 paper below. If you use our work, please cite:

Online Schema Evolution is (Almost) Free for Snapshot Databases
Tianxun Hu, Tianzheng Wang and Qingqing Zhou.
PVLDB 16(2) (VLDB 2023).

Software dependencies

Ubuntu

apt-get install -y cmake gcc-11 g++-11 clang-10 libc++-8-dev libc++abi-8-dev
apt-get install -y libnuma-dev libibverbs-dev libgflags-dev libgoogle-glog-dev liburing-dev

Environment configurations

Make sure you have enough huge pages.

  • Tesseract uses mmap with MAP_HUGETLB (available after Linux 2.6.32) to allocate huge pages. Almost all memory allocations come from the space carved out here. Assuming the default huge page size is 2MB, the command below will allocate 2x MB of memory:
sudo sh -c 'echo [x pages] > /proc/sys/vm/nr_hugepages'

This limits the maximum for --node-memory-gb to 10 for a 4-socket machine (see below).

  • mlock limits. Add the following to /etc/security/limits.conf (replace "[user]" with your login):
[user] soft memlock unlimited
[user] hard memlock unlimited

Re-login to apply.


Build it

We do not allow building in the source directory. Suppose we build in a separate directory:

$ mkdir build
$ cd build
$ cmake ../ -DCMAKE_BUILD_TYPE=[Debug/Release/RelWithDebInfo]
$ make -jN

Currently the code can compile under Clang 10.0+. E.g., to use Clang 10.0, issue the following cmake command instead:

$ CC=clang-10.0 CXX=clang++-10.0 cmake ../ -DCMAKE_BUILD_TYPE=[Debug/Release/RelWithDebInfo]

After make there will be six executables under build:

tesseract_DDL_COPY that runs DDLs with Tesseract approach;

tesseract_DDL_LAZY_COPY that runs DDLs with lazy approach;

tesseract_DDL_OPT_LAZY_COPY that runs DDLs with Tesseract-lazy approach;

tesseract_DDL_BLOCK that runs DDLs with blocking approach;

tesseract_DDL_SI that runs DDLs with naive SI approach;

tesseract_NO_DDL that runs with no DDLs;

Run it

$run.sh \
       [executable] \
       [benchmark] \
       [scale-factor] \
       [num-threads] \
       [duration (seconds)] \
       "[other system-wide runtime options]" \
       "[other benchmark-specific runtime options]"`

Run example

# Add column with Tesseract approach under TPC-CD:
./run.sh ./tesseract_DDL_COPY tpcc_org 50 31 10 "-node_memory_gb=30 -cdc_threads=5 -scan_threads=3 -enable_cdc_schema_lock=0 -enable_ddl_keys=0 -pcommit_queue_length=500000 -ddl_total=1 -enable_parallel_scan_cdc=1 -print_interval_ms=1000 -cdc_physical_workers_only=1 -scan_physical_workers_only=1 -client_load_per_core=4500 -latency_stat_interval_ms=25 -enable_large_ddl_begin_timestamp=1" "-d 2 -e 0 -s 1"

# Add column with Tesseract-lazy approach under TPC-CD:
./run.sh ./tesseract_DDL_LAZY_COPY tpcc_org 50 31 10 "-node_memory_gb=30 -cdc_threads=5 -scan_threads=3 -enable_cdc_schema_lock=0 -enable_ddl_keys=1 -pcommit_queue_length=500000 -enable_lazy_background=1 -ddl_total=1 -enable_parallel_scan_cdc=1 -print_interval_ms=1000 -cdc_physical_workers_only=1 -scan_physical_workers_only=1 -client_load_per_core=4500 -latency_stat_interval_ms=25 -enable_lazy_on_conflict_do_nothing=1 -late_background_start_ms=0" "-d 2 -e 0 -s 1"

# Add column with lazy approach under TPC-CD:
./run.sh ./tesseract_DDL_LAZY_COPY tpcc_org 50 31 10 "-node_memory_gb=30 -cdc_threads=5 -scan_threads=3 -enable_cdc_schema_lock=0 -enable_ddl_keys=1 -pcommit_queue_length=500000 -enable_lazy_background=1 -ddl_total=1 -enable_parallel_scan_cdc=1 -print_interval_ms=1000 -cdc_physical_workers_only=1 -scan_physical_workers_only=1 -client_load_per_core=4500 -latency_stat_interval_ms=25 -enable_lazy_on_conflict_do_nothing=1 -late_background_start_ms=0" "-d 2 -e 0 -s 1"

# Add column with Blocking approach under TPC-CD:
./run.sh ./tesseract_DDL_BLOCK tpcc_org 50 31 10 "-node_memory_gb=30 -cdc_threads=5 -scan_threads=3 -enable_cdc_schema_lock=0 -enable_ddl_keys=0 -pcommit_queue_length=500000 -ddl_total=1 -enable_parallel_scan_cdc=1 -print_interval_ms=1000 -cdc_physical_workers_only=1 -scan_physical_workers_only=1 -client_load_per_core=4500 -latency_stat_interval_ms=25" "-d 2 -e 0"

System-wide runtime options

-node_memory_gb: how many GBs of memory to allocate per socket.

-null_log_device: Whether to flush log buffer.

-tmpfs_dir: location of the log buffer's mmap file. Default: /tmpfs/.

cdc_threads: number of CDC threads.

scan_threads: number of scan threads.

enable_parallel_scan_cdc: Whether enable parallel scan and CDC.

Benchmark-specific runtime options

-d 2: DDL transaction starts after 2 seconds

-e 0: DDL workload, see below:

TPC-CD:
    0: Add column
    1: Table Split
    2: Preaggregate
    3: Create Index
    4: Table join
    9: Add constraint
    10: Add column and constraint
    
ODDLB:
    0: Add column
    2: Add constraint
    3: Add column and constraint

-w D: ODDLB - write-heavy workload D.

-s 1: TPC-CD - pick a random home warehouse, 0 if not

-s 100000000: ODDLB - number of records in the database table.

-r 10: 10 queries per transaction.

tesseract's People

Contributors

wangtzh avatar laoawilliam avatar yongjunhe avatar kangnuni avatar jason0214 avatar kaisonghuang avatar hippotas avatar ippokratis7 avatar kangnyeon avatar ziyi-yan avatar utashih avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.