0xproject / 0x-mesh Goto Github PK

View Code? Open in Web Editor NEW

257.0 25.0 110.0 101.91 MB

A peer-to-peer network for sharing 0x orders

Home Page: https://0x-org.gitbook.io/mesh/

License: Other

Go 84.60% JavaScript 1.62% Makefile 0.23% Dockerfile 0.13% TypeScript 13.41% HTML 0.02%

golang 0x ethereum dapp webassembly libp2p webrtc

0x-mesh's Issues

db: Add helper methods for indexes

The db package should include helper methods for indexing and querying common types like bool, string, and int. Currently, we rely on callers to convert these types to/from bytes.

Implement WeijieSub

Currently, 0x Mesh uses FloodSub. For the beta release, we want to implement WeijieSub (i.e. the "order sharing algorithm" described in the Architecture Doc).

Use new Query.Count method to optimize GetMessagesToShare

See

0x-mesh/main.go

Line 197 in 9c12e0b

// TODO(albrow): This could be made more efficient if the db package supported

Once #83 is merged, we should now be able to address this TODO and optimize the query.

rpc: Add HTTP support for `mesh_addOrders` RPC request

Currently one can only call mesh_addOrders via WebSocket connection. It should also be reachable via HTTP.

db: Add configuration options for encoding/decoding

Currently we use JSON but it would be better to make this configurable. Also the default choice should probably be an encoding that is more space-efficient.

Support persistent identities

Right now we generate a new private key and peer ID each time 0x Mesh starts. We should use persistent identities instead, which probably involves loading in a private key from a file.

Don't share orders that have been marked as deleted

See

0x-mesh/main.go

Line 200 in 9c12e0b

// TODO(albrow): Add an index for IsDeleted and don't return messages that

Telemetry Spec Proposal

A mesh node would expose telemetry information, such as go runtime metrics / network / mesh specific information, that can be used to monitor its health and status.

Minimum implementation

#141

Exposing Metrics

To gather and expose those metrics it would use Prometheus and its official Go client library.

Those metrics instead of being pushed would be scraped from the node itself over HTTP.

Whether or not a running node will expose telemetry is configured via ENV_VARS:

type Config struct {
...

	// RunTelemetry is whether to expose mesh metrics
	RunTelemetry bool `envvar:"RUN_TELEMETRY" default:"false"`

	// TelemetryPort is the port on which metrics are exposed
	// :<TelemetryPort>/metrics
	TelemetryPort int `envvar:"TELEMETRY_PORT" default:"3000"`
}

}

To add a custom metric, we first define it:

package telemetry

import (
	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/promauto"
)

var (
	AddOrdersRequests = promauto.NewCounter(prometheus.CounterOpts{
		Name: "mesh_jrpc_request_add_orders_total",
		Help: "The total number of mesh_addOrders JRPC requests",
	})

	AddPeerRequests = promauto.NewCounter(prometheus.CounterOpts{
		Name: "mesh_jrpc_request_add_peer_total",
		Help: "The total number of mesh_addPeer JRPC requests",
	})
)

And then invoke its method, in the case of Counter we increase it:

// AddOrders calls rpcHandler.AddOrders and returns the validation results.
func (s *rpcService) AddOrders(orders []*zeroex.SignedOrder) (*zeroex.ValidationResults, error) {
	telemetry.AddOrdersRequests.Inc()
	return s.rpcHandler.AddOrders(orders)
}

Example Metrics pulled from the example implementation:

# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 950272
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 7.25486e+07
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 13
# HELP mesh_invalid_orders_seen The total number of invalid orders mesh has rejected via JSON RPC
# TYPE mesh_invalid_orders_seen counter
mesh_invalid_orders_seen 0
# HELP mesh_jrpc_request_add_orders_total The total number of mesh_addOrders JRPC requests
# TYPE mesh_jrpc_request_add_orders_total counter
mesh_jrpc_request_add_orders_total 173
# HELP mesh_jrpc_request_add_peer_total The total number of mesh_addPeer JRPC requests
# TYPE mesh_jrpc_request_add_peer_total counter
mesh_jrpc_request_add_peer_total 0
# HELP mesh_p2p_invalid_orders_seen The total number of invalid orders mesh has seen
# TYPE mesh_p2p_invalid_orders_seen counter
mesh_p2p_invalid_orders_seen 0
# HELP mesh_p2p_valid_orders_already_stored The total number of valid orders mesh has already stored and rejected via p2p
# TYPE mesh_p2p_valid_orders_already_stored counter
mesh_p2p_valid_orders_already_stored 0
# HELP mesh_p2p_valid_orders_seen The total number of valid orders mesh has seen
# TYPE mesh_p2p_valid_orders_seen counter
mesh_p2p_valid_orders_seen 0
# HELP mesh_valid_orders_seen The total number of valid orders mesh has accepted via JSON RPC
# TYPE mesh_valid_orders_seen counter
mesh_valid_orders_seen 0
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 2.14

The offical Go Library includes metrics surrounding the Go Runtime itself giving useful information like number of os threads created and garbage collection time.

Consuming Metrics

The node running telemetry would expose metrics on: http://localhost:<TELEMETRY_PORT> which can then be scraped and consumed either by prometheus or simply ingested by a dashboard application.

Visualization can be easily done on grafana or again, by a bespoke local dashboard application.

Alerting

If the node will be monitored by Prometheus itself it would be trivial to add alert rules for some key metrics.

A basic example:

alert: HighOrderRejectionRate
expr: sum(mesh_p2p_invalid_orders_seen[5m]) > 1000 
for: 1m
labels:
 severity: low
annotations:
 summary: High amount of rejected orders
 description: The node has rejected more than 1000 orders in the last 5 minutes

TODO

Advantages of using Prometheus

Easy to setup
Plenty of support and dev tooling (Grafana, ...)
Scalable to monitor a big cluster deployment
Can be used to expose information to local dashboards
Powerful functional query language makes it easy to select and aggregate time series data in real time -> https://prometheus.io/docs/prometheus/latest/querying/basics/

Disadvantages

Probably not useful for running in a WASM environment?
.

db: Implement transactions

We just need to build an API on top of leveldb.Transaction that exposes transaction functionality to callers.

core: Add WebSocket Interface

0x Mesh requires a WS interface where node operators can submit orders they wish to watch and submit to the network, as well as subscribe to state-changes in the orders they are storing.

Run our own bootstrap nodes

Depends on #72.

We should use our own list for bootstrapping the DHT. In #69 we use the default IPFS bootstrap list.

rpc: broken pipe error on Mesh when WS client exits

While testing the Mesh JSON-RPC orders subscription endpoint, I've discovered a bug where if the client is force-killed (ctrl-c in terminal), the Mesh node emits the following error on an endless interval:

{
    "error": "write tcp4 127.0.0.1:60557->127.0.0.1:57219: write: broken pipe",
    "level": "error",
    "msg": "error while calling notifier.Notify",
    "time": "2019-05-30T10:53:41+01:00"
}

Additional observations:

Stops after 205 errors emitted
Occuring intermittently

blockwatch: Implement fast-sync in BlockWatcher

Currently, every time a Mesh node is re-started, it does two things:

Starts fetching blocks from where it last left off and begins the process of catching up with the latest block, processing all events sequentially.
Loads all orders from the DB into the OrderWatcher and starts re-validating the orders as the contract events are emitted.

If it's been a very long time since the node was last online however, it could take a while for the node to catch up with the latest block. Instead, we should check if we've fallen behind on start-up and fast-sync using eth_getLogs block range queries to catch up quickly with the latest block. We are able to fetch and process up to 60 blocks worth of events per query.

zeroex: Change interface to `BatchValidate()`

The current interface to BatchValidate() is not explicit about orders that were not validated because of network issues. Let's change it to also return a response similar to AddOrdersResponse.

docker: Generate privkey when launching Docker image (if non exists)

There should be two ways to start the Mesh Docker container:

With a specified volume pointing to the 0x_mesh main data directory on the host.
Without a volume, and in this case, the Docker container would run mesh-keygen to generate and store a private key to use.

Browser support checklist

This issue serves as a way to track overall progress on WebAssembly/browser compatibility. A lot of progress has already been made. For example, we added WebAssembly support to pion/webrtc which will serve as the underlying mechanism for establishing a direct connection between two browser-based peers.

Here are the remaining tasks (including those already in progress):

Reach general consensus on specification for WebRTC signaling protocol.
Depending on where we land with the signaling protocol, we may need to add Wasm support to go-ws-transport. Browser peers would use the WebSocket transport to communicate with a signaler and then switch to the WebRTC transport with the desired peer after signaling is complete.
Update go-libp2p-webrtc-direct to support direct browser-to-browser connections via the signaling protocol, or alternatively, create a new transport that supports them.
Add Wasm support to go-libp2p-peerstore.
Create Wasm-compataible peer discovery mechanism (likely either Kademlia DHT or Peer Exchange).
Update BrowserFS to support special file descriptors 0, 1, and 2.
Add WebAssembly support to go-leveldb, likely by using BrowserFS. Partial Wasm support (works in Node but not in browsers) has already been implemented.
Implement JavaScript/TypeScript callback functions for interacting with 0x Mesh from the browser.
Publish JavaScript/TypeScript package which is a wrapper around our callback functions and the Wasm bundle.

zeroex/orderwatch/core: Add `EXPIRATION_BUFFER` env var

Deciding when a 0x order is expired can be a tricky thing. Comparing the ExpirationTimeSeconds field with the current UTC time doesn't provide the full picture. The reason for this, is that the order's ExpirationTimeSeconds field is actually going to be compared against the blockTime of the block within which the miner is trying to include the fill transaction.

Since Ethereum is a decentralized network of nodes which do not attempt to synchronize clocks, the only hard requirement for blockTime's is that it must be greater then it's parent block's blockTime. As you might have already realized, that means the blockTime set by a miner might be before or after your Mesh node's current UTC time. 😢

Let's recap: You can't assume current UTC time is what the miner is using for blockTime, and you have no way of knowing what blockTime they are using. So what is there to do?

You try your best. And depending on what you are using 0x Mesh for, you might want to tweak the EXPIRATION_BUFFER when runnning your node.

TODOs

Add EXPIRATION_BUFFER env var to Mesh configs and hook up to OrderWatcher
Decide on a sane default for EXPIRATION_BUFFER, so that orders don't end up expired by the time our neighbors validate the ones we send them

zeroex: Remove `err` return value from `NewAssetDataDecoder()`

For NewAssetDataDecoder(), we should really be panicking if abi.JSON(strings.NewReader(erc20AssetDataAbi)) returns an error because erc20AssetDataAbi is a developer set constant, and if it's incorrect, the developer should fix this immediately.

db: Add proper character escaping

Currently, indexes will break if you use values that contain the ":" character. This is obviously pretty bad. Luckily it can be solved with a simple character escaping algorithm.

blockwatch: Move to ethereum package

Block watching is very much an Ethereum-specific function. The blockwatch package should be moved to be nested beneath the ethereum package.

zeroex: Ensure orders only contain ERC20 & ERC721 assetData before validating

The current implementation of OrderValidator.sol we are using to batch validate 0x orders does not support validating orders involving the MultiAssetProxy. We therefore cannot support orders involving this proxy until this contract is upgraded.

In the mean time, we should check the maker assetData of the orders we wish to validate before passing them to the contract, in order to avoid the request from failing.

orderwatch: Implement a cleanup worker

Unfortunately watching order validity via contract events is imperfect and there are several ways in which order fillability could change without the event watcher catching it. Because of this, we require a cleanup worker that periodically (e.g. every couple hours) verifies the fillability of all orders that have not been recently updated.

Implement peer discovery

Currently, you must manually add new peers via the AddPeer RPC method. For the beta release, we would like to have some form of automated peer discovery.

Publish a configurable Docker image

The image should have everything you need to run a 0x Mesh node.

Implement RPC method for subscribing to order updates

Currently the RPC API can be used to add new orders and connect to peers, but we need to add support for subscribing to order updates. Clients should be notified when:

An existing order becomes invalid (filled, canceled, or expired).
A previously existing order becomes valid again (due to a block re-org)
A new order is received from a peer.

zeroex: Marshal []byte & number values in SignedOrder to hexidecimal

Currently, when sending a 0x order to Mesh via mesh_addOrders, the JSON-RPC request looks like this:

{
    "jsonrpc": "2.0",
    "id": 2,
    "method": "mesh_addOrders",
    "params": [
        [
            {
                "makerAddress": "0x6440b8c5f5a3c725eb394c7c40994afaf50a0d39",
                "makerAssetData": "9HJhsAAAAAAAAAAAAAAAAMAqqjmyI/6NCg5cTyfq2Qg8dWzC",
                "makerAssetAmount": 1233400000000000,
                "makerFee": 0,
                "takerAddress": "0x0000000000000000000000000000000000000000",
                "takerAssetData": "9HJhsAAAAAAAAAAAAAAAAA2HdfZIQwZ5pwnpjSsMtiUNKIfv",
                "takerAssetAmount": 1233400000000000,
                "takerFee": 0,
                "senderAddress": "0x0000000000000000000000000000000000000000",
                "exchangeAddress": "0x4f833a24e1f95d70f028921e27040ca56e09ab0b",
                "feeRecipientAddress": "0xa258b39954cef5cb142fd567a46cddb31a670124",
                "expirationTimeSeconds": 1560917245,
                "salt": 1545196045897,
                "signature": "G2pJMCd0sLDhTvWekfz5UN+321cFrmkp4GGYUYsRBTAdTvlLG0dg5VA3i7W3dGsaKcF0KQr+lEgyTO9BEt0D16ED"
            }
        ]
    ]
}

There are two issues to address here:

The []byte fields in the SignedOrder need to be sent over-the-wire in Base64 encoding. This is different from what is expected from Ethereum developers (0xprefixed and hex-encoded), and I feel strongly we should adhere to these developer norms.
The order submission fails if any of the amounts submitted are larger then 64 bits long (int64). This is probably why Ethereum JSON-RPC avoids passing any uint256 values as numbers. Instead, they always return amounts hex-encoded (e.g., https://github.com/ethereum/wiki/wiki/JSON-RPC#eth_gasprice). We might need to do the same.

zeroex: Add Ethereum txHash to Order events emitted

Some of the order events emitted by Mesh were triggered by the EventWatcher, and in those cases, the state change (e.g., filled -> cancelled) can be directly linked to a specific Ethereum transaction. Both to allow consumers to verify the validity of a specific order event, and for them to pull additional information about the transaction that caused that state change, we should return the corresponding Ethereum transaction when available.

New OrderInfo struct:

type OrderInfo struct {
    OrderHash                          common.Hash
    SignedOrder                       *SignedOrder
    FillableTakerAssetAmount  *big.Int
    OrderStatus                        OrderStatus
    TxHash                                common.Hash
}

Since common.Hash is an alias for [32]byte, if there is no corresponding TxHash for an event (e.g., new order discovered in P2P network), the zero-value is set.

mesh: Cannot compile Mesh without cgo

We are currently pulling in some part of go-ethereum that requires cgo to compile. This is a blocker for browser support so we must investigate and figure out if we can remove the offending dependency.

db: Implement Update method

zeroex: Add support for validating Coordinator orders

Coordinator orders require an additional validation step: checking if the order has been soft-cancelled via the Coordinator server endpoint. In order to properly prune Coordinator orders, we'd need to add this check to our validation logic.

Use consistent naming for logger

For logging errors, sometimes we use the key "err" and sometimes we use the key "error". It's important to be consistent here because it will make monitoring and setting up the ELK stack much easier.

Return an error if we can't find contract addresses for a given `networkID`

This is probably best implemented by a helper function with the signature:

func getContractAddressesForNetworkID(networkID int) (ContractNameToAddress, error)

zeroex: Make `SignedOrder.Order` not a pointer

See #92 (comment) and #92 (comment)

db: Make `Close` safer by automatically tracking ongoing methods.

The documentation for leveldb.DB.Close states:

It is not safe to close a DB until all outstanding iterators are released.

Currently, the onus is on callers to make sure they aren't calling any other methods when calling Close. We could improve usability by tracking this automatically.

docker: Use CGO_ENABLED=1 in Dockerfile

Since the Dockerfile is only used for server-side deployments and the cgo version of our dependencies are more performant then the non-cgo version, let's switch to compiling with cgo enabled when building the Docker image.

Perhaps we could use the frolvlad/alpine-glibc base image.

Make asynchronous APIs more consistent

When we first started building 0x Mesh, we weren't certain about the best practices for writing asynchronous APIs in Go and had to consider a wide variety of unexpected use cases and constraints. At this point, enough of the pieces are in place that we should be able to go back and choose a general set of guidelines to apply to all of our asynchronous code.

Here are some general rules I'm proposing:

Avoid exporting non-blocking functions/methods. Instead, mimic methods from the standard library like http.ListenAndServe, which blocks until there is an error. Callers can wrap this in a goroutine if needed.
Don't expose channels in exported functions, methods, or fields.
Whenever we do continuous computation in a for loop (i.e. a for loop without a post statement) we should check for context.Done. This usually necessitates accepting a context.Context as an argument.
Don't use context.Background except in tests and main.go.
Don't use a Close method. Just cancel the context.
Don't pass in timeouts as a time.Duration. Just use a context and expect the caller to use context.WithTimeout if they need timeouts.

There are of course going to be some exceptions (e.g. go-ethereum/rpc requires us to have an exported method which returns a channel), but I think using a set of guidelines will make it a lot easier for us to maintain the code going forward.

db: Improve error messages.

We should probably have typed errors for common cases like NotFoundError and TypeError.

Validate order schemas

AFAICT, we don't currently do schema validations for orders received from peers (i.e., making sure they have the correct structure and all the required fields). We could use JSON Schemas for this. However, we will probably eventually switch to a more efficient encoding over the wire, in which case, some kind of schema validation that matches our encoding format would be preferred.

orderwatch: Persist orders to DB

Currently the orderwatch does not persist orders to the DB. We want it to persist orders so that it can be re-started without requiring all orders to be re-validated.

core: Add extra validation to `[]byte` fields on incoming SignedOrders received from peers

Currently we simply check if an incoming message from a peer can be unmarshalled into an SignedOrder struct. In order to prevent the batchValidate() method from panicking when calling the getOrdersAndTradersInfo() contract method, we need to perform some additional validation on a SignedOrder's []byte fields.

Make sure the byte array representing assetData's are decodable and valid.
Make sure the signature byte array has an even number of bytes.

If either of these conditions fail, the message should be immediately discarded and not forwarded on to the batchValidate() call.

Rename files to be more idiomatic Go

Rename:

expirationwatch.go -> expiration_watcher.go
expirationwatch_test.go -> expiration_watcher_test.go
zeroex/decoder -> asset_data_decoder.go
orderwatch/decoder -> event_decoder.go
blockwatch/watcher -> block_watcher.go
orderwatch/watcher -> order_watcher.go

mesh: Support all testnets

At this moment, it only supports Mainnet and Ganache. Supporting Ropsten might be useful in case of wanting to stress test the fill integration in a remote server that doesn't support Ganache, without spending gas while filling the orders.

rpc: Add paginated get all orders in DB method

We have designed Mesh such that if it crashes for any reason, it resumes from where it left off upon re-start. This means clients will receive all order events despite the disruption. But what if the client is disrupted? The order event subscriptions won't queue up until their return.

In order to handle this scenario, we will add a paginated method for fetching all orders currently in the Mesh database. This will allow a client that has crashed to update all the orders in their DB/in-memory store and recover to an up-to-date snapshot of the valid orders stored by the Mesh node.

zeroex: Upgrade to OrderValidator contract with signature validation

Currently the getOrderInfosAndTraderInfos method in the OrderValidator library we are using doesn't validate signatures. We should upgrade to a contract that does.

Implement ETH balance checking

The Architecture Doc describes an "order storing algorithm" which involves checking the ETH balance of peer who created the order. We need to implement this for the beta release.

zeroex: `BatchValidate` fails if the tokens specified in an order don't exist

There is an outstanding issue with the current implementation of getOrdersAndTradersInfo() on OrderValidator.sol. If any of the tokens involved in an order do not exist (e.g., there is no code at the specified address), the call to the method reverts without any error message and the method returns with an empty string. We need to change getOrdersAndTradersInfo() to make it check for the existence of the contracts before calling their balanceOf, and allowance functions so that it returns a balance/allowance of 0 if the contract doesn't exist rather then revert. That way, this edge-case won't let one faulty order cause the entire batch validation of up to 500 orders to fail.

db: Add benchmarks and tune performance

Implement low-level rate-limiting

We should implement some basic rate-limiting that gives a negative score and eventually disconnects from peers that send too many messages over a short period of time. We should probably start with strict limits and relax them as usage grows and we implement more sophisticated incentive mechanisms.

rpc: Allow submitting multiple 0x orders in one JSON-RPC request

Currently, the WS JSON-RPC interface requires that 0x orders be submitting one at a time. This is very inefficient since each request is calling BatchValidate() under-the-hood with a single order. If one could submit multiple orders in a single request, they could also be validated in fewer batches.

core: Add config flag to disable p2p-networking

Some developers might want to use Mesh as a replacement for 0xorg/order-watcher and for that use-case, the p2p networking isn't required.

This is also useful when stress-testing the order-watching functionality of the 0x Mesh node.