Giter VIP home page Giter VIP logo

Comments (13)

edmundnoble avatar edmundnoble commented on August 16, 2024 3

Thank you for your report.

The config flags mentioned in #1613 may be able to decrease mempool latency, but they do so by increasing the amount of mempool-related network traffic on the P2P network. They decrease the number of hops between nodes by allowing each node to sync to 30 other nodes rather than 6, and they make nodes poll each other for new transactions every 10 seconds rather than every 30 seconds. Telemetry on our own nodes suggests that the mempool accounts for the majority of the traffic on the P2P network, making these measures a somewhat risky proposition.

There have been efforts internal to Kadena to optimize the architecture of the mempool to a number of ends, including decreasing its propagation delay and decreasing the amount of network traffic it produces. However, I'm sorry to say that they haven't been brought to the finish line yet, partly because they are such a big undertaking to do correctly; we need to experiment, monitor, and we may even have to simulate these changes before putting them into production, and frankly the delay has seemed fine for most users, even if it's not as short as it could be. But it is on our long term roadmap to revive these efforts.

from chainweb-node.

wwared avatar wwared commented on August 16, 2024 2

I believe that both chainweb.ecko.finance and api.chainweb.com are deployments using a load balancer in front of more than one node (so sending the same transaction to it twice can actually make it reach two different chainweb-node instances behind the load balancer), while the kadena2.app.runonflux.io deployment is a single chainweb-node instance hosted directly on Flux, so the config issues mentioned in #1613 might indeed be related to the problem. However, I'd still think the issue would be replicated sometimes even with a single-node deployment, no? Considering that it would affect every node with the default mempool settings, and thus have an impact due to the network effects.

I'll reach out to the Ecko team and update the config used for their nodes with the mempool changes from #1613, to help anyone trying to investigate this issue. Will update here after the config change has gone live.

from chainweb-node.

larskuhtz avatar larskuhtz commented on August 16, 2024 2

Here is a typical mempool propagation profile of a pending transaction:
image
The time on the x-axis is in seconds. The y-axis shows the number of mempools/nodes where the tx is present.

Lifecycle of a Pending Transaction

A typical transaction transitions through the following phases:

  1. Tx is submitted to a single mempool where it stays for a while until the next sync with a peer happens.
  2. Tx is gossiped between mempools. The number of mempools where the tx is available grows exponentially over time. The curve flattens a bit once the network is almost saturated.
  3. When the tx arrives at mining nodes it waits until the currently mined parent block is resolved.
  4. When the previous block is resolved the tx gets included into the next block (assuming the block is notcompletely filled with txs of higher priority). It will then be deleted from the mempool of the mining nodes. It remains pending in the mempools of all non-mining nodes.
  5. When block with the tx is resolved, the block is propagated (with high priority) throughout the network. When a node validates the respective block the tx is deleted from the mempool of that node.

Transaction Propagation Histogram

The following chart shows the distribution of how long it takes transactions to be propagated to 90% of all nodes in the network:

image

Most transactions take less than 50 seconds in order to propagate to almost all nodes in mainnet.

Remarks

The time that a tx is waiting for the previous block to be mined and the time that the block that includes the tx is mined does not depend on the implementation of the mempool. In particular it has nothing to do with the implementation or parameterization of the P2P network. Typically this makes up at least 50% of the lifetime of a pending transaction.

How fast a tx gets included in a block depends on how long it has to wait for the previous block to be minded after the tx arrives at a mining node. There is a relatively chance that it arrives at some mining node right before mining of a new block starts and at other mining node when mining has already started. Depending on which mining node wins the block, that transaction may be included at the respective block height or may "skip" that block.

Block mining times are distributed exponentially. That distribution as a lot of mass at shorter block times to make up for the long tail. That means many blocks are mined very quickly (within just some seconds). Therefore it can happen regularly that transactions "skip" more than one block when they arrive with just a few seconds of delay at different mining nodes. That, however, doesn't necessarily mean that those transactions have to "wait" longer to be included in "some" block. Because blocks may just be mined more quickly. On the other hand it can take a long until a transaction is included in the next block, if that block (or the previous block) happened to take a long time to be mined. Therefore, just counting block heights (or "skipped" blocks) is not a good a measure of tx latency times.

from chainweb-node.

masch1na avatar masch1na commented on August 16, 2024

I think it's important to note that this test was performed against "chainweb.ecko.finance" node which is used by Ecko (Kaddex). The DEX has VERY little traffic for me to believe the delay is caused by any overload though. There were almost no swaps. This node is used by the go-to DEX (arguably), therefore these symptoms will be felt by majority of the Kadena ecosystem users, which is why I think it's worth investigating.

Separate test against "api.chainweb.com" resulted in pretty much the same outcome.

However, I was not able to replicate this issue when using my node or "kadena2.app.runonflux.io" which is used by KDSwap.

Since I wasn't able to replicate this issue on my node or KDSwap node I wanted to close this report out, but I'll leave it open for any thoughts from the team.

from chainweb-node.

trendzetter avatar trendzetter commented on August 16, 2024

It could be useful to compare actual config options set and different outcomes.

from chainweb-node.

trendzetter avatar trendzetter commented on August 16, 2024

I didn't actually test the 2 transactions at the same time but here is related talk #1613

from chainweb-node.

masch1na avatar masch1na commented on August 16, 2024

Ah I wasn't aware there is a similar thread already open. If there is a way to merge them and if you think it's a good idea please feel free to do so.

With regards to comparing node configs and your suggestions in #1613 - my node is actually set up in similar way, those configs stood out to me as well.

I think it would be worth comparing configs, there used to be "https://api.chainweb.com/chainweb/{apiVersion}/mainnet01/config" endpoint as documented here https://api.chainweb.com/openapi/#tag/config/paths/~1config/get - but it doesn't seem to work anymore unfortunately. I'm not sure how would we get config from nodes other than from node operators which is unlikely to happen.

The only thing that makes me doubt these configurations are the cause of issues is because in my test, TXs were sent to the same node. It's very unlikely I didn't fit on so many ocassions into the "30 second" poll interval.

But regardless, I wish the config endpoint would work.

from chainweb-node.

masch1na avatar masch1na commented on August 16, 2024

I would like to add that since the recent 2.18.1 (2023-03-06) chainweb change this problem seems to have gotten worse.

from chainweb-node.

masch1na avatar masch1na commented on August 16, 2024

Hey @wwared , thanks so much for your input. Let us know once the changes are live, eager to test.

Regardless of the fact that these are load balanced nodes - this is actually precisely the reason for this issue report - this shouldn't happen. If two different nodes receive a transaction at the same time, these transactions should be mined within the same block on the same height. There's no reason for them to be a block or 2, sometimes even 3 apart with the blocks inbetween often being empty.

Also, it was my understanding that flux nodes are in fact load balanced nodes as well.

I have some tests that I want to try with my own node, will report later when I have time.

from chainweb-node.

masch1na avatar masch1na commented on August 16, 2024

Hey wwared, very good info about the fact that those nodes are load balanced. This gave me an idea for another test.

I now did a test sending two transactions at the same time to two different nodes - one to "eu-node-1.hypercent.io", one to "kadena2.app.runonflux.io".

This time out of 12 tests, 4 times the transactions were mined on different heights. Twice actually with two heights between each other.

I think this is something worth looking into by the team and improved. It might be as simple as tweaking default values as mentioned in #1613, but I'm not sure.

from chainweb-node.

masch1na avatar masch1na commented on August 16, 2024

Thank you for your acknowledgement of the issue, I understand this isn't something we can just change for fun and see how it goes because its going to have some impact which needs research / testing beforehand.

It's not the end of the world the way it is now, but the fact that transactions created at the same time are sometimes mined 2-3 heights apart just isn't right. You are right, the delay might be fine for a casual user who does coin transfer once a month, but there are times when this isn't fine, hence the report. I guess I will have to figure out some workarounds... I have some specific real-life examples that are quite horrific which I am happy to share in private with the devs. Please get in touch on discord masch1na#4142 if interested.

For your future considerations of fixing this delay:

Instead of jumping from 6 -> 30 nodes, maybe we can try 12. Instead of polling every 30 seconds, maybe we can try 15. These are less aggresive changes than #1613, but are still likely going to be beneficial. I mean if someones infrastructure can't handle 12 API polls every 15 seconds, they might not be the best candidate for a node host anyway.

My current node configuration is this:

pollInterval: 10
maxSessionCount: 25
sessionTimeout: 150

I am running my node on $300 linux laptop connected to VDSL internet running through ancient telephone lines at 25 Mbps download / 3 Mbps upload with low end router provided from my ISP. That's not a fiber connection which most of the world already has access to and it's running perfectly fine.

Every node operator should be expected to handle 3 Mbit/s traffic imo. Please see below for some network statistics using the node configuration listed above.

Some packet statistics captured in the last 40 minutes:

--------------------------------------+------------------
  packets                     696725  |         1221584
--------------------------------------+------------------
          max                562 p/s  |        1114 p/s
      average                302 p/s  |         529 p/s
          min                206 p/s  |         400 p/s
--------------------------------------+------------------

Here are my recent hourly traffic rates ->

     hour        rx      |     tx      |    total    |   avg. rate
 ------------------------+-------------+-------------+---------------
 2023-03-23
     12:00    528,14 MiB |  379,67 MiB |  907,80 MiB |    2,12 Mbit/s
     13:00    528,39 MiB |  388,78 MiB |  917,16 MiB |    2,14 Mbit/s
     14:00    489,08 MiB |  378,74 MiB |  867,82 MiB |    2,02 Mbit/s
     15:00    488,48 MiB |  378,22 MiB |  866,70 MiB |    2,02 Mbit/s
     16:00    492,18 MiB |  380,34 MiB |  872,52 MiB |    2,03 Mbit/s
     17:00    502,52 MiB |  379,11 MiB |  881,62 MiB |    2,05 Mbit/s
     18:00    563,19 MiB |  392,62 MiB |  955,81 MiB |    2,23 Mbit/s
     19:00    640,39 MiB |  388,97 MiB |    1,01 GiB |    2,40 Mbit/s
     20:00    680,24 MiB |  393,86 MiB |    1,05 GiB |    2,50 Mbit/s
     21:00    705,74 MiB |  396,32 MiB |    1,08 GiB |    2,57 Mbit/s
     22:00    732,69 MiB |  397,92 MiB |    1,10 GiB |    2,63 Mbit/s
     23:00    725,19 MiB |  396,26 MiB |    1,10 GiB |    2,61 Mbit/s
 2023-03-24
     00:00    678,70 MiB |  394,62 MiB |    1,05 GiB |    2,50 Mbit/s
     01:00    731,52 MiB |  401,69 MiB |    1,11 GiB |    2,64 Mbit/s
     02:00    677,04 MiB |  392,48 MiB |    1,04 GiB |    2,49 Mbit/s
     03:00    709,33 MiB |  397,51 MiB |    1,08 GiB |    2,58 Mbit/s
     04:00    724,63 MiB |  401,83 MiB |    1,10 GiB |    2,62 Mbit/s
     05:00    567,24 MiB |  388,53 MiB |  955,76 MiB |    2,23 Mbit/s
     06:00    424,73 MiB |  375,80 MiB |  800,53 MiB |    1,87 Mbit/s
     07:00    422,03 MiB |  374,02 MiB |  796,05 MiB |    1,85 Mbit/s
     08:00    419,73 MiB |  370,86 MiB |  790,59 MiB |    1,84 Mbit/s
     09:00    422,52 MiB |  375,28 MiB |  797,81 MiB |    1,86 Mbit/s
     10:00    420,79 MiB |  373,39 MiB |  794,18 MiB |    1,85 Mbit/s
     11:00    245,12 MiB |  218,71 MiB |  463,84 MiB |    1,85 Mbit/s
 ------------------------+-------------+-------------+---------------

And here are top 10 days with maximum data transmitted ->

 #   day          rx      |     tx      |    total    |   avg. rate
 ------------------------+-------------+-------------+---------------
1   2022-12-11    10,75 GiB |   10,94 GiB |   21,69 GiB |    2,16 Mbit/s
2   2022-12-10    10,74 GiB |   10,76 GiB |   21,50 GiB |    2,14 Mbit/s
3   2022-12-09    10,70 GiB |   10,70 GiB |   21,41 GiB |    2,13 Mbit/s
4   2022-12-06    10,93 GiB |   10,41 GiB |   21,33 GiB |    2,12 Mbit/s
5   2022-12-12    10,13 GiB |   10,80 GiB |   20,92 GiB |    2,08 Mbit/s
6   2022-12-07    10,13 GiB |   10,57 GiB |   20,70 GiB |    2,06 Mbit/s
7   2022-12-13     9,78 GiB |   10,68 GiB |   20,46 GiB |    2,03 Mbit/s
8   2022-12-08     9,86 GiB |   10,54 GiB |   20,39 GiB |    2,03 Mbit/s
9   2022-12-14     9,65 GiB |   10,61 GiB |   20,26 GiB |    2,01 Mbit/s
10  2022-12-15     9,58 GiB |   10,60 GiB |   20,18 GiB |    2,01 Mbit/s
 ------------------------+-------------+-------------+---------------

from chainweb-node.

jwiegley avatar jwiegley commented on August 16, 2024

It's not the end of the world the way it is now, but the fact that transactions created at the same time are sometimes mined 2-3 heights apart just isn't right. You are right, the delay might be fine for a casual user who does coin transfer once a month, but there are times when this isn't fine, hence the report. I guess I will have to figure out some workarounds... I have some specific real-life examples that are quite horrific which I am happy to share in private with the devs. Please get in touch on discord masch1na#4142 if interested.

Hi @masch1na, I wanted to respond to the idea that the behavior you've noticed "isn't right":

In a distributed, decentralized system like Kadena, there is no guarantee that:

  1. Two transactions received at the same time will be gossiped to the network at the same time, due to queue lengths and the possibility that one message fits into the queue and the other has to wait until it empties;
  2. Two transactions gossiped at the same time will be re-gossiped by other nodes also at the same time;
  3. These two transactions sitting in the mempool, due to the P2P traffic, will be "right next to each other" in the mempool of the node used by the blockmaker, who may choose to put one into one block, and another into another block, for any of a number of reasons.

The analogy to consider here is a lot more like putting two letters into your mailbox at home, at the same time and both destined for the same address. While it's likely that the postal person will pick them up at the same time, there is no guarantee that they will stay "bundled" throughout the process of sorting, shipping and delivery. It will probably happen often, but is in no way guaranteed. It could be days between their delivery, in fact, due to all sorts of other factors.

Kadena therefore offers no guarantees when it comes to contemporaneous receipt at the node, as to when the transactions are mined, distributed, and evaluated. If you truly need two transfers to happen at the same time on chain, the only definite solution is to post the calls to coin.transfer in the same transaction.

from chainweb-node.

trendzetter avatar trendzetter commented on August 16, 2024

The important new part that I have a hard time grasping fully here is: we are seeing bursts of blocks which might result in seemingly skipping multiple blocks. The "30 seconds blocktime" may give a skewed impression of reality because its average.

from chainweb-node.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.