threefoldtech / mycelium Goto Github PK

End-2-end encrypted IPv6 overlay network

License: Apache License 2.0

Rust 98.27% Shell 1.07% Nix 0.66%

e2e-encryption ipv6-network overlay-network routing tunnel

mycelium's Introduction

Mycelium

Mycelium is an IPv6 overlay network written in Rust. Each node that joins the overlay network will receive an overlay network IP in the 400::/7 range.

Features

Mycelium, is locality aware, it will look for the shortest path between nodes
All traffic between the nodes is end-2-end encrypted
Traffic can be routed over nodes of friends, location aware
If a physical link goes down Mycelium will automatically reroute your traffic
The IP address is IPV6 and linked to private key
A simple reliable messagebus is implemented on top of Mycelium
Mycelium has multiple ways how to communicate quic, tcp, ... and we are working on holepunching for Quick which means P2P traffic without middlemen for NATted networks e.g. most homes
Scalability is very important for us, we tried many overlay networks before and got stuck on all of them, we are trying to design a network which scales to a planetary level
You can run mycelium without TUN and only use it as reliable message bus.

We are looking for lots of testers to push the system

see here for docs

Running

Currently, Linux, macOS and Windows are supported.

First, get an useable binary, either by downloading an artifact from a release, or by checking out and building the code yourself.

On Windows, you must have wintun.dll in the same directory you are executing the binary from.

Once you have an useable binary, simply start it. If you want to connect to other nodes, you can specify their listening address as part of the command (combined with the protocol they are listening on, usually TCP). Check the next section if you want to connect to hosted public nodes.

mycelium --peers tcp://188.40.132.242:9651 quic://185.69.166.8:9651

#other example with other tun interface if utun3 (the default) would already be used
#also here we use sudo e.g. on OSX
sudo mycelium --peers tcp://188.40.132.242:9651 quic://185.69.166.8:9651 --tun-name utun9

By default, the node will listen on port 9651, though this can be overwritten with the -p flag.

To check your own info

mycelium inspect --json
{
  "publicKey": "abd16194646defe7ad2318a0f0a69eb2e3fe939c3b0b51cf0bb88bb8028ecd1d",
  "address": "5c4:c176:bf44:b2ab:5e7e:f6a:b7e2:11ca"
}
# test that network works, ping to anyone in the network
ping6 5c4:c176:bf44:b2ab:5e7e:f6a:b7e2:11ca

The node uses a x25519 key pair from which its identity is derived. The private key of this key pair is saved in a local file (32 bytes in binary format). You can specify the path to this file with the -k flag. By default, the file is saved in the current working directory as priv_key.bin.

Running without TUN interface

It is possible to run the system without creating a TUN interface, by starting with the --no-tun flag. Obviously, this means that your node won't be able to send or receive L3 traffic. There is no interface to send packets on, and consequently no interface to send received packets out of. From the point of other nodes, your node will simply drop all incoming L3 traffic destined for it. The node will still route traffic as normal. It takes part in routing, exchanges route info, and forwards packets not intended for itself.

The node also still allows access to the message subsystem.

Hosted public nodes

A couple of public nodes are provided, which can be freely connected to. This allows anyone to join the global network. These are hosted in 3 geographic regions, on both IPv4 and IPv6, and supporting both the Tcp and Quic protocols. The nodes are the following:

Node ID	Region	IPv4	IPv6	Tcp port	Quic port
01	DE	188.40.132.242	2a01:4f8:221:1e0b::2	9651	9651
02	DE	136.243.47.186	2a01:4f8:212:fa6::2	9651	9651
03	BE	185.69.166.7	2a02:1802:5e:0:8478:51ff:fee2:3331	9651	9651
04	BE	185.69.166.8	2a02:1802:5e:0:8c9e:7dff:fec9:f0d2	9651	9651
05	FI	65.21.231.58	2a01:4f9:6a:1dc5::2	9651	9651
06	FI	65.109.18.113	2a01:4f9:5a:1042::2	9651	9651

These nodes are all interconnected, so 2 peers who each connect to a different node (or set of disjoint nodes) will still be able to reach each other. For optimal performance, it is recommended to connect to all of the above at once however. An example connection string could be:

--peers tcp://188.40.132.242:9651 "tcp://[2a01:4f8:212:fa6::2]:9651" quic://185.69.166.7:9651 "tcp://[2a02:1802:5e:0:8c9e:7dff:fec9:f0d2]:9651" tcp://65.21.231.58:9651 "quic://[2a01:4f9:5a:1042::2]:9651"

It is up to the user to decide which peers he wants to use, over which protocol. Note that quotation may or may not be required, depending on which shell is being used.

Private network

Mycelium supports running a private network, in which you must know the network name and a PSK (pre shared key) to connect to nodes in the network. For more info, check out the relevant docs.

API

The node starts an HTTP API, which by default listens on localhost:8989. A different listening address can be specified on the CLI when starting the system through the --api-server-addr flag. The API allows access to send and receive messages, and will later be expanded to allow admin functionality on the system. Note that message are sent using the identity of the node, and a future admin API can be used to change the system behavior. As such, care should be taken that this API is not accessible to unauthorized users.

Message system

A message system is provided which allows users to send a message, which is essentially just "some data" to a remote. Since the system is end-to-end encrypted, a receiver of a message is sure of the authenticity and confidentiality of the content. The system does not interpret the data in any way and handles it as an opaque block of bytes. Messages are sent with a deadline. This means the system continuously tries to send (part of) the message, until it either succeeds, or the deadline expires. This happens similar to the way TCP handles data. Messages are transmitted in chunks, which are embedded in the same data stream used by L3 packets. As such, intermediate nodes can't distinguish between regular L3 and message data.

The primary way to interact with the message system is through the API. The message API is documented in an OpenAPI spec in the docs folder. For some more info about how to use the message system, see the message docs.

Inspecting node keys

Using the inspect subcommand, you can view the address associated with a public key. If no public key is provided, the node will show its own public key. In either case, the derived address is also printed. You can specify the path to the private key with the -k flag. If the file does not exist, a new private key will be generated. The optional --json flag can be used to print the information in json format.

mycelium inspect a47c1d6f2a15b2c670d3a88fbe0aeb301ced12f7bcb4c8e3aa877b20f8559c02
Public key: a47c1d6f2a15b2c670d3a88fbe0aeb301ced12f7bcb4c8e3aa877b20f8559c02
Address: 47f:b2c5:a944:4dad:9cb1:da4:8bf7:7e65

mycelium inspect --json
{
  "publicKey": "955bf6bea5e1150fd8e270c12e5b2fc08f08f7c5f3799d10550096cc137d671b",
  "address": "54f:b680:ba6e:7ced:355f:346f:d97b:eecb"
}

Developing

This project is built in Rust, and you must have a rust compiler to build the code yourself. Please refer to the official rust documentation for information on how to install rustc and cargo. Aside from the rust toolchain, you might require an openssl install to be present on the machine. If you want to build a statically linked binary, you can add the vendored-openssl feature flag to the build command.

First make sure you have cloned the repo

git clone https://github.com/threefoldtech/mycelium.git

Then go into the cloned directory and build it

cd mycelium
cargo build

In case a release build is required, the --release flag can be added to the cargo command (cargo build --release).

Cross compilation

For cross compilation, it is advised to use the cross project. Alternatively, the standard way of cross compiling in rust can be used (by specifying the --target flag in the cargo build command). This might require setting some environment variables or local cargo config. On top of this, you should also provide the vendored-openssl feature flag to build and statically link a copy of openssl.

Remarks

The overlay network uses some of the core principles of the Babel routing protocol (https://www.irif.fr/~jch/software/babel/).

mycelium's People

Contributors

Stargazers

Watchers

Forkers

vanheesmaxime dorucioclea a-kenji flokli aleksandergondek steveej-forks r-vdp kaya-sem

mycelium's Issues

Issue labels in this repository are not aligned with Threefold guidelines

If a route is retracted remove it from the set of routes

Currently a retraction generally leaves a route with infinite metric, which is not needed

Prepare codebase for large scale test

Some things need to be done before we can do a large scale test

Local testing

Do some local testing. The main focus here is system stability and behavior

Introduce prefix type

Currently we use prefixes as intended in the protocol, but these are not properly utilized during the routing phase. Routing matches full IP addresses, which essentially limits us to send data to the specific IP assigned at node start. To fix this:

Introduce a dedicated "prefix" type (IP address + subnet size)
Change existing protocol code to take in a prefix and return prefix on reception. Prefix length in the data can then be derived from the prefix (this actually means we will automatically get dynamically sized prefixes instead of hardcoded ones).
In the routing, use destination IP and look for a subnet where "the destination is part of said subnet". This will require some minor changes in the data structure used for the routing table probably.
Additionally, we should check and make sure that subnets are unique/nonoverlapping.
To this end, we should also add some kind of routefilter, which is applied to every incoming update when we learn a route. If the route fails the filter it can be discarded. This way we can limit an update to have a prefix of at most /64, and reject ipv4 routes, for instance.

a terminal is required

When executing mycelium in a docker through docker exec, the-t flag has to be passed, don't know if this impacts running it through zinit, systemd or starting it from another process.

Don't send Updates for selected routes to peers which are the next hop of said route

Since we consider those peers to be the best path, sending these routes to them makes no sense, as an update for that route would then mean that peer would send a packet for this route to us, we send it back to that peer, and then it magically reaches its destination.

All in all, these updates are redundant and only waste some bandwidth

show log message that myceilum is not working properly if myceilum can not connect to any peer

for some reason I provided a peer which was down, so myceilum was not able to send or receive messages while nothing shown in the logs that says no peers available or so, I think this should be communicated to the user to update the peer address or so

Enforce hops to have a cost

Currently if a peer has a sub 1 ms RTT, it is considered to have a metric cost of 0, essentially causing havoc in the route selection. For instance, if there are 50 peers on the same subnet, it might very well be possible for the node to select a route to a remote destination which first hops over all other 49 peers as this is considered to not cost anything, which is obviously false.

sort out the repo

0.9.0 is for project 3.13
1.0.0 is for project 3.14

please make sure the milestones are created and issues are properly assigned

Look into alternative underlay connections

Right now we use TCP, which can lead to things like tcp in tcp. We should look into UDP / SCTP for underlay as well

Avoid using openssl

Currently used because of reqwest. Need to play with the features to avoid this. Either drop SSL/TLS support (only a local API anyway), or switch to rustls

After a retraction an unfeasible route could be selected

Join routing tables

Currently there are 2 routing tables: the selected and the fallback. As the name implies, the selected routing table holds the selected route while the fallback holds any non selected routes which are or were recently viable.

The code needed to manage these multiple routing tables is however very cumbersome. Aside from the fact that we generally have 2 tables, thus using double the amount of memory for simply existing, we also need to do some extra work to ensure the tables are in a good state at all times while managing them. This is most notable in the route selection. While this can be optimized, doing so would require tracking a lot of different states and then making a large decision table. Since everything which influences state essentially doubles the amount of states, it was decided to not do this.

But if there was only 1 table with all the routes, a lot of this complexity would be avoided. It would also be better memory wise for the reason stated above, and we can perfectly mange by changing the routes to be a simple vector instead. This obsoletes #27

Rework message pop and peek to be able to block on the server

Currently the API, when doing a message peek/pop, immediately returns with either a message or no message. This is undesirable since a client thus has to either periodically poll, meaning messages might not be received as they are requested, or busy poll in a tight loop, which essentially burns out a cpu core.

The API needs to be reworked so that a call to these methods can wait for some time before returning if no data is ready. Ideally, the amount of time can be configured. This naturally leads to the use of an optional query parameter defining the amount of time to wait. Other than that, we need to consider if the default behavior is to return immediately or to wait for some default timeout.

Internally there are multiple ways to achieve this. One option would be to rework the VecDequeue of finished messages into a tokio::mpsc::channel::Receiver. Another one would be to work with a Notify (kind of like a semaphore), which is fed when new data comes in.

Rework route aquisition

Currently there way routes are propagated is buggy, leading to strange behavior. While the rfc is somewhat murky about this, we do observe behavior that is for sure not correct. One problem is that we don't increase route seqno, which is obviously not what is intended. This probably causes #40 (in conjunction with the fact that fallback routes are improperly cleared, see #2).

Additionally, we can't rely on info from a peer handshake in the router, see #42.

Process does not end on Ctrl-c in the shell

Don't assume a route is selected if an update has a route entry

Currently we assume that if an update has a route entry (which might be a fallback), there must be a selected route. This is not the case as a subnet might have no selected route if they are all unfeasible.

Review and document protocol

Mostly the protocol headers should be reviewed. Next to this, the data packet needs to be expanded with an intermediate header (inside the encrypted content) which details what kind of data it is. For now we only have raw packets, but this will be expanded in the future. By differentiating between these inside the encrypted content, the actual type of data is transparent for the hops.

Peer handshake can cause listening socket to exit

Currently there are 2 unwraps when accepting a new peer in the handshake. If a remote exits quickly, this can cause the listening socket to be killed by a panic. Since we already propagate errors up, simply use ? and handle the error in the caller.

Rewrite router internals to use a different concurrency mechanism

The router has 2 main functions: route data packets, and handle control flows. Since these things can happen simultaneously, the router need to arrange how part of its internals are accessed and modified to avoid data races.

The initial implementations uses a RwMutex, which is fine for the general case. For the read only path (i.e. process data packets), this does not seem to have any real impact on performance. The same however can not be said for the write path. Since a write lock also prevents a read lock, and writes seem to take a decent amount of time, we essentially have a "stop the world" kind of situation in the router. This translates to ping spikes running in tens of miliseconds (locally), since the read locks are waiting for the write lock to dissipate.

Considering the write path is infrequently used, it is likely better to look for an alternative concurrency approach which prioritizes read performance, even if it comes at a cost for write performance. A possible approach is to track 2 sets of mutable state and use an atomic pointer to point to a readable version. When we write, we do so on the not pointed to type, and then switch the pointer, and apply the same write. In this scenario, reads would never be blocked

Imrove route selection strategy

Currently a route is selected when an update arrives, based on the metric. This approach is problematic. Most notably, it does not account for the feasibility condition and thus selectes unfeasible routes, creating routing loops. If a route is advertised by peer A with some metric M, to some node B, it might be that some node C connected to B learns this route from a peer D which is between B and C. This route will have the same seqno, but a higher metric (or the same in case of a local network). If node A now announces it's route with a higher seqno and increased metric, node B might unselect the route to node A, since it has a route to A via C with lower metric. At this point, we created a routing loop.

According to the RFC, route selection must be run after every feasible update is applied. We also need to consider for every route if it is still feasbile. In practice, we can actually simplify this a little bit.

If the routing table has no route for the prefix, it is installed as the selected route.
If the routing table has a route for a prefix with seqno S and metric M, we apply the same logic as the feasbility check in the RFC:
- If the seqno S is higher, install the route from the update as selected route. If the currently selected route is different (i.e. different neighbour), move it to the fallback table.
- If the seqno S is equal, and the metric M is strictly smaller, install the route as selected, and if the currently selected route has a different neighbour, move it to the fallback table.
- We should also consider broadcasting this update to our peers. After all, new seqno's should be learned as soon as possible.
In case the route is already a fallback route, we update it in place using the same feasibility checks.

This approach guarantees that the best route (feasible and lowest metric) is installed for a prefix at any point in time. After all, the metric can only increase if the seqno increases, and increasing the seqno immediately makes all other routes unfeasible.

The only issue left are retraction updates. When a route is retracted, we should apply the update, and then run a full route selection on the prefix.

myceilum takes a lot of time to be functional

I tried yesterday with @MarioBassem we both connected to the same peer and tried to reach each other by sending messages but it takes a lot of time and suddenlly it worked
I tried the same today it is taking too many time and giving this

Improve allocation behavior for packets read from tun

Currently, and MTU sized buffer is allocated, which is then used to read a packet from the tun. Next, (assuming a valid destination), the packet is encrypted, and the nonce is appended. The current implementation does not reuse the allocation of the original data and instead allocates a new vector, of sufficient size for the AES_GCM tag and encrypted data. Then, we extend the vector, possibly growing the allocation but very likely triggering a full reallocation, to append the nonce.

Since we know the max MTU, nonce size, and tag size, we can simply allocate a buffer large enough upfront, provide this to the read method (possibly with some slicing), and encrypt in place. This would remove 2 allocations and deallocations, as wel as a memmove (should not be that expensive) in the hot path of the code.

Implement route hold timers

Should fix #2

Add SeqNo requests

This should be send when a node receives an unfeasible update for a selected route, as per https://datatracker.ietf.org/doc/html/rfc8966#section-3.8.2.2. This is needed before we #44 , as #44 by itself might cause routing to break without this (as selected routes would go stale)

Add telemetry

Telemetry is needed to gain insight in how the application is behaving. There are 2 kinds we are interested in: traces and metrics. For metrics, we want information regarding the async runtime and system it's running on (to see how changes in the system reflect on changes in the application and vise versa), and general stats such as transmitted packets, dropped packets, etc... For traces, it would be ideal if we can capture traces for entire packet flows (sending, forwarding, receiving), once again to verify how this behaves in various scenarios.

Split project into lib and bin

Right now the project is a single bin, it would be better if the majority of the business logic resided in a lib, which is then used by a small wrapper binary project.

Rework (or drop) peer exchange handshake

Currently the overlay IP is exchanged, which is then also used in the router to generate retractions. This is not how babel works, where there is no handshake. In our case, we could consider keeping the handshake (doing something different like e.g. requesting a random signature to prove the receiver owns the private key), or outright drop it.

Benchmark

cpu load
memory allocation

Implement a way to send arbitrary messages with delivery confirmation

A user needs to have a way to send a message to a peer, and a best effort attempt is done to deliver the message for some time. Status of the message (sent/delivered/other) needs to be queriable as well.

To transport the message (which can be arbitrary size), it can be chunked into small packets (similar to regular data traffic as is), and encrypted as regular data. Then it is forwarded to the destination with the regular flow. Since this requires differentiation between message en regular data traffic, this requires #28 first.

To manage this, another type MessageBox (or similar) is made which takes care of taking in arbitrary messages, chunks them, and injects them into the router. There also needs to be a way to filter data which is spit out by the router. Inside the message protocol, support needs to be added to acknowledge receipt (chunk based), and re transmit after some time.

Properly send ICMP

This is a bit difficult to do, but in general we should be able to send some decent ICMP packets to the tun interface if something goes wrong. This includes no route to host in case we try to send to some unknown address.

Document the private key file format

Split selected and fallback routing table backing types

Right now a single RoutingTable type is used, both for selected and fallback routing table. However, the selected table will have at most 1 entry for a subnet (the selected route), so the inclusion of a HashSet is not required there. Creating a separate type to express this would offer some minor benefits on the hot path of the code.

Create subnet filter

Right now any update is accepted, but this should not be the case. Instead, a subnet filter should be created to limit acceptable updates. For now this can be a global filter, but eventually it would be interesting to have these filters configurable per peer.

Implement source entry expiration timers

Cleanup codebase

The current codebase is pretty WIP. While it works, there are some standards which can be applied, and legacy code/comments which can be removed / refactored. This way we can start new development from a clean codebase

Add logging

Originally printing happened with (e)print(ln)! statements, which is not ideal. This has changed to use the log facade crate, but some more work is needed to verify log lvls and add logs in places where that makes sense.

Don't include link cost of destination peer in update

Currently an update advertises the local metric of a route + the link cost of the destination peer. This is however invalid. Instead, a peer should avertise a link cost of route metric + next hop link cost. This seems similar at first, however the main difference is that the static (i.e. local) subnet is always advertised with metric 0. This means that directly connected peers have this route saved with link cost 0 as well, and advertise this route with a metric equal to the cost of the route source from that node.

This will primarily result in more feasible routes. Feasibility distance is still computed from the advertised metric. This means that it includes the link cost of the final hop, whereas the metric in the advertisements will not, which is what ends up being checked against the feasibility distance.

Add API

There needs to be an API which provides some visibility in the internals. Mostly it should be able to expose the routing tables, but it might also be a good idea to add or remove static peers through it

Expand router ID to include some random data

In the RFC, a router id is just 8 bytes random data. In our case, it is 32 bytes, the public key of the private key owned by a system. This means that in our case, router id's are "stable", i.e. a node (and prefix) always have the same router id. Although it is not precisely specified in the RFC, the official babel implementation always generates a random router id on start and you can't force the node to use a specific one.

It turns out that fixed router id's lead to a "fast restart problem". In short, if a node restarts (and thus resets its seqno) with the exact same router id, then it might be that there are still (retracted) routes available on nodes in the network. While the routes themselves are not necessarily a problem, it also means that the source entry will still be present. The source entry is used to determine if an update is feasible. If the seqno in the source entry is not too far ahead of the now 0 seqno, it will cause updates to be unfeasible for an extended period of time. This will in turn prevent routes from being acquired after a restart again, until either the source entry is purged (which requires implementing the timer, we currently also don't have that), or the seqno would be bumped high enough.

As such we need to expand router id to also include some randomness, so restarts change the router ID slowly. For this we will add 8 bytes. The first 2 will be set to 0 to allow us to use these for future purposes. The next 6 will be random. Note that we only need to be random compared with the same publickey (first 32 bytes), so 2^48 bits of randomness should be plenty.

As an added bonus, this also paves the way for anycasting.

to be able to run mycelium next to yggdrasil please find an alternative ipv6 subnet

Limit amount of router seqno bumps

Currently if there is an issue in the network a flood of seqno bumps can come in, which will all increment router seqno. it would be best to limit this to 1 every x amount of time

to many fallback routes

there still is an issue with the population of fallback routing table. When a node joins the network and it first receives and update with the 65534 metric, and later with a lower metric (once the metric has been established through Hello and IHU), the entry with metric 65534 correctly gets placed into the fallback (instead of selected) but the route always stays there. Even when a more appropriate fallback route comes in.

OSX compatibility

Currently rtnetlink is used to communicate with the linux kernel. This does not work on osx.

Isolate tun code

The tun code is currently in between other code. It should be isolated to its own module. Also, since tun code and setting routes is generally OS dependent, there should be a top level module exposing a generic tun interface, which also has a Stream and Sink implementation. Then, submodules should be made for specific implementations. Depending on which platform we compile for, the top level implemenation can, at compile time, select the proper submodule.

Use trie for routing table storage

some peers are down

I got this peers list from @despiegk and only the last one is working

- 185.206.122.77:9651
- 83.231.240.31:9651
- 146.185.93.83:961

can we update the above two?

Don't update source entry if it is a retraction

Source entry updates should be done only if the route is viable. This is an improvement in preparation for #49

Send packets to a containing subnet instead of a specific address

Now that we work with prefixes, checks should validate if a destination is part of a prefix instead of an exact match of the base