ConduitDB

ConduitDB is a horizontally scalable blockchain parser and indexer. It uses ScyllaDB with a tight database schema optimized for write throughput and makes limited use of Redis for managing some of the state between worker processes.

ConduitDB is designed to be a flexible, value-added service that is capable of doing all of the indexing and data storage work entirely on its own (without running a node at all). This can include storing and indexing the full 10TB+ blockchain with:

Raw transaction lookups
Merkle proof lookups
UTXO lookups
Historic address-based / or pushdata-based lookups from genesis to chain tip. This is mainly to support seed-based restoration of wallets but could also be used for _unwriter protocol namespace lookups and querying
Tip filter notification API gives notifications when pushdatas are seen of 20, 33 or 65 bytes in length i.e. pubkey hash (20 bytes) addresses for P2PKH, public keys for P2PK (33 or 65 for compressed vs uncompressed). Other use cases in the token and diverse namespace protocols are also likely.
Output spend notifications API gives notifications when a particular utxo is spent. This is useful to know that a broadcast transaction has indeed propagated to the mempool and subsequently been included in a block OR if it has been affected by a subsequent reorg. The SPV wallet can react accordingly for each of these events and fetch the new merkle proof - updating their local wallet database.

It is the opinion of the author that these APIs cover all of the basic requirements for an SPV wallet or application.

You can also turn off unnecessary features if these APIs can already be covered by other services such as a local bitcoin node, or if you want to fetch raw transactions from a 3rd party API such as conduitdb.com / whatsonchain.com / TAAL. This flexibility avoids storing or indexing the 10TB+ blockchain twice.

ConduitDB therefore fits into your existing stack to add value in a way that you only pay for what you use. It can either be run in a lightweight mode for the bare minimum required support for SPV wallets / applications like ElectrumSV or it can be configured to store and index absolutely everything or anything in-between - perhaps only scanning for a token sub-protocol or bitcom namespace of interest to you and your application.

An instance that indexes and stores everything is running at conduit-db.com and is free for public use but may need to be rate-limited depending on demand.

Licence	MIT
Language	Python 3.10
Author	Hayden Donnelly (AustEcon)

Getting Started

ConduitDB deployment is docker-based only. ScyllaDB only loses 3% performance in docker when properly configured

ConduitDB connects to the p2p Bitcoin network so doesn't technically require you to run your own full node. However, ideally you will have a bare metal, localhost bitcoind instance. You can get the latest version from here (https://www.bsvblockchain.org/svnode). See notes below on using a non-localhost node.

Once you have a node to connect to, update the .env.docker.production config file where it says:

NODE_HOST=127.0.0.1
NODE_PORT=8333

Set the other configuration options in .env.docker.production such as:

SCYLLA_DATA_DIR=./scylla/data
CONDUIT_RAW_DATA_HDD=./conduit_raw_data_hdd
CONDUIT_RAW_DATA_SSD=./conduit_raw_data_ssd
CONDUIT_INDEX_DATA_HDD=./conduit_index_data_hdd
CONDUIT_INDEX_DATA_SSD=./conduit_index_data_ssd
REFERENCE_SERVER_DIR=./reference_server

ConduitDB makes deliberate use of fast (HDD) vs slow (SSD/NVME) storage volumes for the data directories. Raw blocks and long arrays of transaction hashes are written sequentially to HDD to economise on disc usage. SSD/NVME is used for memory mapped files and of course ScyllaDB.

By default all directories will be bind mounded (by Docker Compose) into the above locations at the root directory of this cloned repository. If you're running in prune mode (deleting raw block data after parsing), then the defaults will work well for you on NVME storage. This is the recommended configuration.

Running the production configuration is only supported on linux. Building the python_base image only needs to be done once (and every time the requirements.txt changes).

docker build -f ./contrib/python_base/Dockerfile . -t python_base

If you really want to test out the production configuration on windows you could use WSL.

./run_production.sh

To tail the docker container logs:

./tail_production_logs.sh

Development

Python Version

Currently python3.10 is required

Build all of the images

docker build -f ./contrib/python_base/Dockerfile . -t python_base
docker-compose -f docker-compose.yml build --parallel --no-cache

Run static analysis checks

./run_static_checks.bat

Or on Unix:

./run_static_checks.sh

Run all functional tests and unittests locally

./run_all_tests_fresh.bat

Or on Unix:

./run_all_tests_fresh.sh

2) Running ConduitRaw & ConduitIndex

Windows cmd.exe:

git clone https://github.com/conduit-db/conduit.git
cd conduit-db
set PYTHONPATH=.

Now install packages and run ConduitRaw (in one terminal)

py -m pip install -r .\contrib\requirements.txt
py .\conduit_raw\run_conduit_raw.py

And ConduitIndex (in another terminal)

py -m pip install -r .\contrib\requirements.txt
py .\conduit_index\run_conduit_index.py

Unix Bash terminal

git clone https://github.com/conduit-db/conduit.git
cd conduit-db
export PYTHONPATH=.

Now install packages and run ConduitRaw (in one terminal)

python3 -m pip install -r ./contrib/requirements.txt
python3 ./conduit_raw/conduit_server.py

And ConduitIndex (in another terminal)

python3 -m pip install -r ./contrib/requirements.txt
python3 ./conduit_index/conduit_server.py

Configuration

All configuration is done via the .env files in the top level directory.

.env is for bare metal instances of ConduitDB services when iterating in active development
.env.docker.development is for the docker-compose.yml which is used for automated testing and the CI/CD pipeline
.env.docker.production is for the docker-compose.production.yml which is used for production deployments.

Notes on using a non-localhost node

Indexing from a non-localhost node is an experimental feature at the present moment but I hope to improve upon this at a later date.

If you still want to go ahead with connecting to a remote node, ideally your IP address should be a whitelisted on this node and it should be a low-latency connection (i.e. ideally in the same data center or at least in the same geographical region). Remote nodes that have not whitelisted you will likely throttle your initial block download.

Even with a local bitcoin node, if it's main raw block storage is on a magnetic harddrive, it will max out the sequential read capacity of the harddrive at around 200MB/sec. This becomes the speed limit for everything.

Acknowedgments

This project makes heavy use of the bitcoinx bitcoin library created by kyuupichan for tracking of headers and chain forks in a memory mapped file
The idea for indexing pushdata hashes came from discussions with Roger Taylor the lead maintainer of electrumsv

Parallel block download over p2p (in order to eventually ditch the node)

We need a HeaderSV-like "Peer manager" & "Chain tip tracker". It could be named something like: ConduitConnect

Its most important function would be to act as a pre-fetcher for raw blocks in parallel on behalf of ConduitRaw.

ConduitRaw would send it a request for blocks from e.g. height 1000 - 1500.

ConduitConnect would now divide up the work into contiguous sections of blocks.
If there were 10 peers there would be 10 x 50 header chunks. E.g. height 1050 - 1100, 1150 - 1200 etc.
There would be 10 open p2p socket connections - one for each node
There would be a max network buffer of 128MB. Therefore max memory usage of 10 x 128MB = 1280MB for streaming raw blocks.

A block < 128MB in size is considered a SMALL_BLOCK type. And so ConduitConnect would respond to ConduitRaw with this raw block immediately (keeping this small block in memory / network buffers the whole time)

A block > 128MB in size is considered a LARGE_BLOCK type. And so ConduitConnect would stream the raw block first to disc to avoid OOM'ing. The filename would == the block hash in hex and would be written to a kind of "staging area" (the file can be simply renamed and moved later to the proper location 😃 )

If LARGE_BLOCK type then ConduitRaw would receive a different response. It would be given the location of the raw block (already written to disc) and to process it (the merkle tree info and tx offsets etc.) it would read it from disc - once again... this processing should be streams-processing-based and should not need to load the whole block into memory at once. At the end, it would NOT re-write it to disc. It would just move the file / rename it - should be instantaneous.

conduit-db / conduit-db Goto Github PK

conduit-db's Introduction

ConduitDB

Getting Started

Development

Python Version

Build all of the images

Run static analysis checks

Run all functional tests and unittests locally

2) Running ConduitRaw & ConduitIndex

Windows cmd.exe:

Unix Bash terminal

Configuration

Notes on using a non-localhost node

Acknowedgments

conduit-db's People

Contributors

Watchers

Forkers

conduit-db's Issues

Recommend Projects

Recommend Topics

Recommend Org