Sparkle-based PancakeSwap subgraphs

This repo holds the exchange and prediction subgraphs from PancakeSwap that make use of the sparkle indexing framework. They are developer previews for The Graph Network.

Usage

Build with:

go build -o pancakeswap-exchange ./cmd/exchange

Call the graph-node to create the deployment:

./pancakeswap-exchange create namespace/target_name    # create  a row in `subgraph` table (current_version = nil, previsou_version = nil)

Deploy it, as you would with normal subgraphs:

./pancakeswap-exchange deploy namespace/target_name    # create  a row in `subgraph_deployment` &`subgraph_version` & IPS upload & `deployment_schemas` & Update `subgraph` table current_version, previous_version (MAYBE)

Running linearly:

Start the linear indexer:

./pancakeswap-exchange index namespace/target_name@VERSION

You will find the VERSION printed when you deploy the subgraph.

Running in parallel

NOTE: As of August 17th 2021, step is in the process of being renamed to stage everywhere, so adjust accordingly depending on your revision.

These are approximate commands to run the full flow:

Stage 1

./pancakeswap-exchange parallel step \
      --step=1 \
      --blocks-store-url ./path/to/blocks \
      --start-block 6809700 \
      --stop-block 6829699 \
      --rpc-endpoint http://some-full-node:8545 \
      --output-path ./path/to/stage1

The ./path/to/blocks represents the firehose merged blocks files. They will be consumed directly and do not require a running firehose service.

Paths for --blocks-store-url, --input-path and --output-path use the dstore library which supports S3, Azure, GCP and filesystems.

You will then adjust the --start-block and --stop-block and repeat the process until you cover the desired chain segment (as close to the chain tip as possible). Both parameters are inclusive, so do not overlap.

These processes can be run in parallel.

Stage 2

./pancakeswap-exchange parallel step \
      --step=2 \
      --blocks-store-url ./path/to/blocks \
      --start-block 6809700 \
      --stop-block 6829699 \
      --rpc-endpoint http://some-full-node:8545 \
      --input-path ./path/to/stage1 \
      --output-path ./path/to/stage2

Again, repeat with adjusted start/stop blocks. Ensure Stage 1 is fully complete before starting Stage 2.

Stage 3 is same as Stage 2

Simply adjust --step, --input-path and --output-path

Stage 4 is slightly different:

./pancakeswap-exchange parallel step \
      --step=2 \
      --blocks-store-url ./path/to/blocks \
      --start-block 6809700 \
      --stop-block 6829699 \
      --rpc-endpoint http://some-full-node:8545 \
      --input-path ./path/to/stage3 \
      --output-path ./path/to/entities \
      --flush-entities \
      --store-snapshot=false \

This one will write entities for each single block, not only a snapshot at the end (to feed the latest Entities state into the next Stage) like previous stages do.

At this moment, you can inspect the files in ./path/to/entities for data quality. These are the values that will end up in PostgreSQL in the next steps. But you can iterate on your process faster if you discover data quality issues at this stage. You can also be selective as to which stages and which segments you really need to re-run if you tweak the handler code.

Next step is to create the CSV files (`to-csv` command).

These will split the workload on a different axis: one process per entities collection. So the worst case timing in this operation will be the largest table (with the most rows or most data).

./pancakeswap-exchange parallel to-csv \
      --input-path ./path/to/entities \
      --output-path ./path/to/csvs \
      --only-tables=swap \
      --chunk-size=20000 \
      --stop-block=0

Notice the --only-tables here, that allows you to limit the tables that will be generated by this process. If you don't specify it, it will run for all tables: you will only need to run it once (no parallel processing), but you won't gain from increased performance of parallelisation. If you split the work, use ./pancakeswap-exchange --help to list all the entities managed by this subgraph and run one process for each entity.

The --chunk-size specifies where to chunk the .csv files. This only scopes the size of the postgres transaction when you to injecting into postgres, as each file will be able to be added atomically in the RDBMS. If one fails, you'll know sooner, and you can retry it without retrying one huge .csv file.

Drop indexes

Truncating the target tables is required to avoid duplicate data.

Dropping the indexes will ensure the fastest injection time possible:

./pancakeswap-exchange drop-indexes \
      pancakeswap/exchange-v2@[version]  \
      --psql-dsn postgresql://user:[email protected]:5432/graph?sslmode=disable

This method also supports --only-tables but its quite fast so you shouldn't need to parallelize it.

Inject into PostgreSQL

Once the csv files are all generated, next step is to inject into PostgreSQL:

./pancakeswap-exchange parallel inject \
      pancakeswap/exchange-v2@[version]  \
      postgresql://user:[email protected]:5432/graph?sslmode=disable  \
      --only-tables=swap  \
      --input-path ./path/to/csvs  \
      --stop-block=0

Here again, you can parallelize the injection and maximize the throughput in PostgreSQL, and use all the available cores. The inject method uses the fastest injection method available: COPY FROM and pipine the CSV through.

Re-create indexes

Now re-create the indexes:

./pancakeswap-exchange drop-indexes \
      pancakeswap/exchange-v2@[version]  \
      --psql-dsn postgresql://user:[email protected]:5432/graph?sslmode=disable

You can also use --only-tables but the process itself sleeps pretty much all the time, as PostgreSQL is doing the hard work. Indexes here will be created in parallel where possible directly from within the process, and linearly when it is known that postgres would lock otherwise.

Benchmark

Here are a few durations, as the worst casetimings (worst case of each parallel job) and storage size estimates when running it on GCP, Here is a sample run for this Subgraph, from block 6,809,700 to 9,000,000, on GCP, split into 160 parallel jobs (in stages 1-4)

The following durations are the worst case from all the 160 parallel jobs at each stage:

Stage 1 duration: 5 minutes, produced: 412.0 MiB of data (pairs)
Stage 2 duration: 8 minutes 33 seconds, produced: 21.61 GiB of data (pair reserves)
Stage 3 duration: 34 minutes, produced: 31.21 GiB of data (price computations)
Stage 4 duration: 42 minutes, produced: 512.22 GiB of data (output of all entities to JSON)

The following duration is for the worst case of the Entities (Swap):

to-csv duration: 157 minutes (2h37m), produced: 242.27 GiB of data (CSV files ready to be injected in postgres)

The following duration is for the worst case of the Entities during injection (Swap), with all indexes dropped before injection:

inject duration: 40 minutes

The following duration is the worst case of the Entities, provided you have more than 640GB of RAM. Otherwise, you might want to split the load in two (taking twice the time):

create-indexes duration: 4 hours

Yes, the original was half-way through after 2 months.

Updating `sparkle`

Generate using sparkle (see https://github.com/streamingfast/sparkle)

sparkle codegen ./subgraph/exchange.yaml github.com/streamingfast/sparkle-pancakeswap
go mod tidy

To init a database

go run ./cmd/exchange -- deploy \
   --psql-dsn="postgresql://postgres:@localhost:5432/YOUR_DATABASE?enable_incremental_sort=off&sslmode=disable" \
   project/subgraph

Contributing

Issues and PR in this repo related strictly to Pancake Generated.

Report any protocol-specific issues in their respective repositories

Please first refer to the general StreamingFast contribution guide, if you wish to contribute to this code base.

License

Apache 2.0

chef-jojo / sparkle-pancakeswap Goto Github PK

sparkle-pancakeswap's Introduction

Sparkle-based PancakeSwap subgraphs

Usage

Running linearly:

Running in parallel

Stage 1

Stage 2

Stage 3 is same as Stage 2

Stage 4 is slightly different:

Next step is to create the CSV files (to-csv command).

Drop indexes

Inject into PostgreSQL

Re-create indexes

Benchmark

Updating sparkle

Contributing

License

sparkle-pancakeswap's People

Contributors

Recommend Projects

Recommend Topics

Recommend Org

Next step is to create the CSV files (`to-csv` command).

Updating `sparkle`