0xproject / 0x-mesh Goto Github PK
View Code? Open in Web Editor NEWA peer-to-peer network for sharing 0x orders
Home Page: https://0x-org.gitbook.io/mesh/
License: Other
A peer-to-peer network for sharing 0x orders
Home Page: https://0x-org.gitbook.io/mesh/
License: Other
The db package should include helper methods for indexing and querying common types like bool
, string
, and int
. Currently, we rely on callers to convert these types to/from bytes.
Currently, 0x Mesh uses FloodSub. For the beta release, we want to implement WeijieSub (i.e. the "order sharing algorithm" described in the Architecture Doc).
Currently one can only call mesh_addOrders
via WebSocket connection. It should also be reachable via HTTP.
Currently we use JSON but it would be better to make this configurable. Also the default choice should probably be an encoding that is more space-efficient.
Right now we generate a new private key and peer ID each time 0x Mesh starts. We should use persistent identities instead, which probably involves loading in a private key from a file.
See
Line 200 in 9c12e0b
A mesh node would expose telemetry information, such as go runtime metrics / network / mesh specific information, that can be used to monitor its health and status.
To gather and expose those metrics it would use Prometheus and its official Go client library.
Those metrics instead of being pushed would be scraped from the node itself over HTTP.
Whether or not a running node will expose telemetry is configured via ENV_VARS
:
type Config struct {
...
// RunTelemetry is whether to expose mesh metrics
RunTelemetry bool `envvar:"RUN_TELEMETRY" default:"false"`
// TelemetryPort is the port on which metrics are exposed
// :<TelemetryPort>/metrics
TelemetryPort int `envvar:"TELEMETRY_PORT" default:"3000"`
}
}
To add a custom metric, we first define it:
package telemetry
import (
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
)
var (
AddOrdersRequests = promauto.NewCounter(prometheus.CounterOpts{
Name: "mesh_jrpc_request_add_orders_total",
Help: "The total number of mesh_addOrders JRPC requests",
})
AddPeerRequests = promauto.NewCounter(prometheus.CounterOpts{
Name: "mesh_jrpc_request_add_peer_total",
Help: "The total number of mesh_addPeer JRPC requests",
})
)
And then invoke its method, in the case of Counter
we increase it:
// AddOrders calls rpcHandler.AddOrders and returns the validation results.
func (s *rpcService) AddOrders(orders []*zeroex.SignedOrder) (*zeroex.ValidationResults, error) {
telemetry.AddOrdersRequests.Inc()
return s.rpcHandler.AddOrders(orders)
}
Example Metrics pulled from the example implementation:
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 950272
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 7.25486e+07
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 13
# HELP mesh_invalid_orders_seen The total number of invalid orders mesh has rejected via JSON RPC
# TYPE mesh_invalid_orders_seen counter
mesh_invalid_orders_seen 0
# HELP mesh_jrpc_request_add_orders_total The total number of mesh_addOrders JRPC requests
# TYPE mesh_jrpc_request_add_orders_total counter
mesh_jrpc_request_add_orders_total 173
# HELP mesh_jrpc_request_add_peer_total The total number of mesh_addPeer JRPC requests
# TYPE mesh_jrpc_request_add_peer_total counter
mesh_jrpc_request_add_peer_total 0
# HELP mesh_p2p_invalid_orders_seen The total number of invalid orders mesh has seen
# TYPE mesh_p2p_invalid_orders_seen counter
mesh_p2p_invalid_orders_seen 0
# HELP mesh_p2p_valid_orders_already_stored The total number of valid orders mesh has already stored and rejected via p2p
# TYPE mesh_p2p_valid_orders_already_stored counter
mesh_p2p_valid_orders_already_stored 0
# HELP mesh_p2p_valid_orders_seen The total number of valid orders mesh has seen
# TYPE mesh_p2p_valid_orders_seen counter
mesh_p2p_valid_orders_seen 0
# HELP mesh_valid_orders_seen The total number of valid orders mesh has accepted via JSON RPC
# TYPE mesh_valid_orders_seen counter
mesh_valid_orders_seen 0
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 2.14
The offical Go Library includes metrics surrounding the Go Runtime itself giving useful information like number of os threads created and garbage collection time.
The node running telemetry would expose metrics on: http://localhost:<TELEMETRY_PORT>
which can then be scraped and consumed either by prometheus or simply ingested by a dashboard application.
Visualization can be easily done on grafana or again, by a bespoke local dashboard application.
If the node will be monitored by Prometheus itself it would be trivial to add alert rules for some key metrics.
A basic example
:
alert: HighOrderRejectionRate
expr: sum(mesh_p2p_invalid_orders_seen[5m]) > 1000
for: 1m
labels:
severity: low
annotations:
summary: High amount of rejected orders
description: The node has rejected more than 1000 orders in the last 5 minutes
TODO
WASM
environment?We just need to build an API on top of leveldb.Transaction that exposes transaction functionality to callers.
0x Mesh requires a WS interface where node operators can submit orders they wish to watch and submit to the network, as well as subscribe to state-changes in the orders they are storing.
While testing the Mesh JSON-RPC orders subscription endpoint, I've discovered a bug where if the client is force-killed (ctrl-c
in terminal), the Mesh node emits the following error on an endless interval:
{
"error": "write tcp4 127.0.0.1:60557->127.0.0.1:57219: write: broken pipe",
"level": "error",
"msg": "error while calling notifier.Notify",
"time": "2019-05-30T10:53:41+01:00"
}
Additional observations:
Currently, every time a Mesh node is re-started, it does two things:
If it's been a very long time since the node was last online however, it could take a while for the node to catch up with the latest block. Instead, we should check if we've fallen behind on start-up and fast-sync using eth_getLogs
block range queries to catch up quickly with the latest block. We are able to fetch and process up to 60 blocks worth of events per query.
The current interface to BatchValidate()
is not explicit about orders that were not validated because of network issues. Let's change it to also return a response similar to AddOrdersResponse
.
There should be two ways to start the Mesh Docker container:
0x_mesh
main data directory on the host.mesh-keygen
to generate and store a private key to use.This issue serves as a way to track overall progress on WebAssembly/browser compatibility. A lot of progress has already been made. For example, we added WebAssembly support to pion/webrtc which will serve as the underlying mechanism for establishing a direct connection between two browser-based peers.
Here are the remaining tasks (including those already in progress):
Deciding when a 0x order is expired can be a tricky thing. Comparing the ExpirationTimeSeconds
field with the current UTC time doesn't provide the full picture. The reason for this, is that the order's ExpirationTimeSeconds
field is actually going to be compared against the blockTime
of the block within which the miner is trying to include the fill transaction.
Since Ethereum is a decentralized network of nodes which do not attempt to synchronize clocks, the only hard requirement for blockTime
's is that it must be greater then it's parent block's blockTime
. As you might have already realized, that means the blockTime
set by a miner might be before or after your Mesh node's current UTC time. ๐ข
Let's recap: You can't assume current UTC time is what the miner is using for blockTime
, and you have no way of knowing what blockTime
they are using. So what is there to do?
You try your best. And depending on what you are using 0x Mesh for, you might want to tweak the EXPIRATION_BUFFER
when runnning your node.
TODOs
EXPIRATION_BUFFER
env var to Mesh configs and hook up to OrderWatcher
EXPIRATION_BUFFER
, so that orders don't end up expired by the time our neighbors validate the ones we send themFor NewAssetDataDecoder()
, we should really be panicking if abi.JSON(strings.NewReader(erc20AssetDataAbi))
returns an error because erc20AssetDataAbi
is a developer set constant, and if it's incorrect, the developer should fix this immediately.
Currently, indexes will break if you use values that contain the ":" character. This is obviously pretty bad. Luckily it can be solved with a simple character escaping algorithm.
Block watching is very much an Ethereum-specific function. The blockwatch
package should be moved to be nested beneath the ethereum
package.
The current implementation of OrderValidator.sol we are using to batch validate 0x orders does not support validating orders involving the MultiAssetProxy. We therefore cannot support orders involving this proxy until this contract is upgraded.
In the mean time, we should check the maker assetData of the orders we wish to validate before passing them to the contract, in order to avoid the request from failing.
Unfortunately watching order validity via contract events is imperfect and there are several ways in which order fillability could change without the event watcher catching it. Because of this, we require a cleanup worker that periodically (e.g. every couple hours) verifies the fillability of all orders that have not been recently updated.
Currently, you must manually add new peers via the AddPeer
RPC method. For the beta release, we would like to have some form of automated peer discovery.
The image should have everything you need to run a 0x Mesh node.
Currently the RPC API can be used to add new orders and connect to peers, but we need to add support for subscribing to order updates. Clients should be notified when:
Currently, when sending a 0x order to Mesh via mesh_addOrders
, the JSON-RPC request looks like this:
{
"jsonrpc": "2.0",
"id": 2,
"method": "mesh_addOrders",
"params": [
[
{
"makerAddress": "0x6440b8c5f5a3c725eb394c7c40994afaf50a0d39",
"makerAssetData": "9HJhsAAAAAAAAAAAAAAAAMAqqjmyI/6NCg5cTyfq2Qg8dWzC",
"makerAssetAmount": 1233400000000000,
"makerFee": 0,
"takerAddress": "0x0000000000000000000000000000000000000000",
"takerAssetData": "9HJhsAAAAAAAAAAAAAAAAA2HdfZIQwZ5pwnpjSsMtiUNKIfv",
"takerAssetAmount": 1233400000000000,
"takerFee": 0,
"senderAddress": "0x0000000000000000000000000000000000000000",
"exchangeAddress": "0x4f833a24e1f95d70f028921e27040ca56e09ab0b",
"feeRecipientAddress": "0xa258b39954cef5cb142fd567a46cddb31a670124",
"expirationTimeSeconds": 1560917245,
"salt": 1545196045897,
"signature": "G2pJMCd0sLDhTvWekfz5UN+321cFrmkp4GGYUYsRBTAdTvlLG0dg5VA3i7W3dGsaKcF0KQr+lEgyTO9BEt0D16ED"
}
]
]
}
There are two issues to address here:
[]byte
fields in the SignedOrder
need to be sent over-the-wire in Base64 encoding. This is different from what is expected from Ethereum developers (0x
prefixed and hex-encoded), and I feel strongly we should adhere to these developer norms.uint256
values as numbers. Instead, they always return amounts hex-encoded (e.g., https://github.com/ethereum/wiki/wiki/JSON-RPC#eth_gasprice). We might need to do the same.Some of the order events emitted by Mesh were triggered by the EventWatcher, and in those cases, the state change (e.g., filled -> cancelled) can be directly linked to a specific Ethereum transaction. Both to allow consumers to verify the validity of a specific order event, and for them to pull additional information about the transaction that caused that state change, we should return the corresponding Ethereum transaction when available.
New OrderInfo struct:
type OrderInfo struct {
OrderHash common.Hash
SignedOrder *SignedOrder
FillableTakerAssetAmount *big.Int
OrderStatus OrderStatus
TxHash common.Hash
}
Since common.Hash
is an alias for [32]byte
, if there is no corresponding TxHash
for an event (e.g., new order discovered in P2P network), the zero-value is set.
We are currently pulling in some part of go-ethereum
that requires cgo to compile. This is a blocker for browser support so we must investigate and figure out if we can remove the offending dependency.
Coordinator orders require an additional validation step: checking if the order has been soft-cancelled via the Coordinator server endpoint. In order to properly prune Coordinator orders, we'd need to add this check to our validation logic.
For logging errors, sometimes we use the key "err"
and sometimes we use the key "error"
. It's important to be consistent here because it will make monitoring and setting up the ELK stack much easier.
This is probably best implemented by a helper function with the signature:
func getContractAddressesForNetworkID(networkID int) (ContractNameToAddress, error)
See #92 (comment) and #92 (comment)
The documentation for leveldb.DB.Close
states:
It is not safe to close a DB until all outstanding iterators are released.
Currently, the onus is on callers to make sure they aren't calling any other methods when calling Close
. We could improve usability by tracking this automatically.
Since the Dockerfile is only used for server-side deployments and the cgo version of our dependencies are more performant then the non-cgo version, let's switch to compiling with cgo enabled when building the Docker image.
Perhaps we could use the frolvlad/alpine-glibc
base image.
When we first started building 0x Mesh, we weren't certain about the best practices for writing asynchronous APIs in Go and had to consider a wide variety of unexpected use cases and constraints. At this point, enough of the pieces are in place that we should be able to go back and choose a general set of guidelines to apply to all of our asynchronous code.
Here are some general rules I'm proposing:
http.ListenAndServe
, which blocks until there is an error. Callers can wrap this in a goroutine if needed.context.Done
. This usually necessitates accepting a context.Context
as an argument.context.Background
except in tests and main.go.Close
method. Just cancel the context.time.Duration
. Just use a context and expect the caller to use context.WithTimeout
if they need timeouts.There are of course going to be some exceptions (e.g. go-ethereum/rpc
requires us to have an exported method which returns a channel), but I think using a set of guidelines will make it a lot easier for us to maintain the code going forward.
We should probably have typed errors for common cases like NotFoundError
and TypeError
.
AFAICT, we don't currently do schema validations for orders received from peers (i.e., making sure they have the correct structure and all the required fields). We could use JSON Schemas for this. However, we will probably eventually switch to a more efficient encoding over the wire, in which case, some kind of schema validation that matches our encoding format would be preferred.
Currently the orderwatch does not persist orders to the DB. We want it to persist orders so that it can be re-started without requiring all orders to be re-validated.
Currently we simply check if an incoming message from a peer can be unmarshalled into an SignedOrder
struct. In order to prevent the batchValidate()
method from panicking when calling the getOrdersAndTradersInfo()
contract method, we need to perform some additional validation on a SignedOrder
's []byte
fields.
assetData
's are decodable and valid.If either of these conditions fail, the message should be immediately discarded and not forwarded on to the batchValidate()
call.
Rename:
expirationwatch.go
-> expiration_watcher.go
expirationwatch_test.go
-> expiration_watcher_test.go
zeroex/decoder
-> asset_data_decoder.go
orderwatch/decoder
-> event_decoder.go
blockwatch/watcher
-> block_watcher.go
orderwatch/watcher
-> order_watcher.go
At this moment, it only supports Mainnet and Ganache. Supporting Ropsten might be useful in case of wanting to stress test the fill integration in a remote server that doesn't support Ganache, without spending gas while filling the orders.
We have designed Mesh such that if it crashes for any reason, it resumes from where it left off upon re-start. This means clients will receive all order events despite the disruption. But what if the client is disrupted? The order event subscriptions won't queue up until their return.
In order to handle this scenario, we will add a paginated method for fetching all orders currently in the Mesh database. This will allow a client that has crashed to update all the orders in their DB/in-memory store and recover to an up-to-date snapshot of the valid orders stored by the Mesh node.
Currently the getOrderInfosAndTraderInfos
method in the OrderValidator
library we are using doesn't validate signatures. We should upgrade to a contract that does.
The Architecture Doc describes an "order storing algorithm" which involves checking the ETH balance of peer who created the order. We need to implement this for the beta release.
There is an outstanding issue with the current implementation of getOrdersAndTradersInfo()
on OrderValidator.sol
. If any of the tokens involved in an order do not exist (e.g., there is no code at the specified address), the call to the method reverts without any error message and the method returns with an empty string. We need to change getOrdersAndTradersInfo()
to make it check for the existence of the contracts before calling their balanceOf
, and allowance
functions so that it returns a balance/allowance of 0 if the contract doesn't exist rather then revert. That way, this edge-case won't let one faulty order cause the entire batch validation of up to 500 orders to fail.
We should implement some basic rate-limiting that gives a negative score and eventually disconnects from peers that send too many messages over a short period of time. We should probably start with strict limits and relax them as usage grows and we implement more sophisticated incentive mechanisms.
Currently, the WS JSON-RPC interface requires that 0x orders be submitting one at a time. This is very inefficient since each request is calling BatchValidate()
under-the-hood with a single order. If one could submit multiple orders in a single request, they could also be validated in fewer batches.
Some developers might want to use Mesh as a replacement for 0xorg/order-watcher and for that use-case, the p2p networking isn't required.
This is also useful when stress-testing the order-watching functionality of the 0x Mesh node.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.