coloclue / kees Goto Github PK

View Code? Open in Web Editor NEW

35.0 35.0 15.0 246 KB

KEES - The Coloclue Network Automation Toolchain

License: BSD 2-Clause "Simplified" License

Shell 12.82% Python 40.33% Jinja 46.85%

kees's People

Contributors

Stargazers

Watchers

Forkers

job nlmark traderose cybertinus pimvanpelt mrngm imrejonk stucchimax micko rkrieger tim427 ascionefrancesco

kees's Issues

Add RPKI ROV to member sessions

At the moment we only have static whitelists of ip addresses which the members are allowed to announce via their BGP session. When a member has a public ASN, we should switch to not having the whitelist, but do validation via RPKI ROV, so they can add and remove prefixes without Coloclue having to change a whitelist.

ipaddr library has been superseded

From https://github.com/google/ipaddr-py/blob/master/README:

ipaddr.py is a library for working with IP addresses, both IPv4 and IPv6.

It has been superseded by ipaddress from the Python 3 standard library, and its
Python 2 backport.

We should consider using ipaddress from the Python 3 standard library.

Add Juniper support

At the moment Kees only support Bird 1.6.x as output. Support for Juniper's JunOS should be added, so those devices can be maintained by Kees too.

Add Peering LANs (for example AMS-IX) to no-export

Add all peering VLANs to the bogon list by default

No peering VLAN of no IX in the world should ever be announced in the DFZ. All the known peering VLANs can be found on PeeringDB, so that information should be used to add all the peering VLANs to the bogon list, so we will never accept such an announcement, should it reach our routers.

peering_filters needs documentation, possibly refactoring

I worked on the peering_filters script today, mostly to try and understand how it works and to make some tiny improvements while I was at it. However, understanding how the script works is quite difficult because it is not well-documented. There are also many nested loops and if statements.

Add RFC1997 Communities for traffic engineering

As you can see at the moment only large communities are supported when a peer wants to prepend 8283, or when it does not want to announce a prefix to a specific ASN. Support for old style communities (RFC1997 communities) should be added too, to support devices that don't have Large Communities support yet.

$ whois as8283 | grep -A 24 ACTION
remarks:        ACTION COMMUNITIES:
remarks:        ===================
remarks:        RFC 1997  | Large      | Meaning (Action)
remarks:        ----------+------------+-------------------------------------
remarks:        8283:50   | 8283:1:50  | Set LOCAL_PREF to 50 (default is 100)
remarks:        8283:150  | 8283:1:150 | Set LOCAL_PREF to 150 (default is 100)
remarks:        -         | 8283:2:0   | Prepend 8283 once to all eBGP peers
remarks:        -         | 8283:2:nnn | Prepend 8283 once to peer AS nnn
remarks:        -         | 8283:3:0   | Prepend 8283 twice to all eBGP peers
remarks:        -         | 8283:3:nnn | Prepend 8283 twice to peer AS nnn
remarks:        -         | 8283:4:0   | Do not export to eBGP peers
remarks:        -         | 8283:4:nnn | Do not export to peer AS nnn
remarks:        65535:0   | -          | G-SHUT [draft-ietf-grow-bgp-gshut]
remarks:        65535:666 | -          | BLACKHOLE [RFC 7999]
remarks:        ----------+------------+-------------------------------------
remarks:
remarks:        MEMBERS ONLY:
remarks:        =============
remarks:        RFC 1997  | Large      | Meaning (Action)
remarks:        ----------+------------+-------------------------------------
remarks:        8283:50   | 8283:1:50  | Set LOCAL_PREF to 50 (default is 500)
remarks:        |            | WARNING: default is 100 for other routes, so your prefixes may be rendered unusable!
remarks:        8283:450  | 8283:1:450 | Set LOCAL_PREF to 450 (default is 500)
remarks:        8283:550  | 8283:1:550 | Set LOCAL_PREF to 550 (default is 500)
remarks:        ----------+------------+-------------------------------------

Migration to BIRD 2

Network.CZ has announced that BIRD 1.6 will be end of life at the end of the year.
Kees still uses BIRD 1.6 and has to be migrated to BIRD 2.

Add the ability to ignore certain ports on IX-es

It should be possible to configure in peers.yaml the list of routers that should be ignored for a specific peer. This would make it easier to not set up a session to a specific peer.
Now you have to configure in peers.yaml to ignore everything that is on PeeringDB and copy the router IPs and prefix limits from PeeringDB to peers.yaml manually. And you have to keep this up-to-date afterwards. This approach will generate issues somewhere in the future.

check PeeringDB API calls

PeeringDB has updated the way their API works, we need to check if our way of using the PeeringDB API is still correct. Maybe we should migrate to a local PDB cache.
More information: https://lists.peeringdb.com/pipermail/pdb-tech/2022-May/000408.html

Add documentation

At the moment there isn’t more documentation then the README. A whole lot more should be written to make the software understandable for other people than the original authors.

peering_filters: checking "is ip in subnet" takes 26% execution time

After another optimization (see #31), I've looked into other parts of the code that could be optimized. When running peering_filters (without all, so no calls to bgpq3; also having a locally cached version of 2 Peering DB JSON files, and the ColoClue peers YAML), pprofile reports the following:

Command line: ./peering_filters
Total duration: 42.9381s
File: /home/mrngm/.local/lib/python3.9/site-packages/ipaddr.py
File duration: 11.2164s (26.12%)

The responsible call in peering_filters:

for asn in peerings:
    # [..]
    for session in sessions:
        session_ip = ipaddr.IPAddress(session)
        for ixp in ixp_map:
            for subnet in ixp_map[ixp]:
                if session_ip in subnet: # this call

For another project, I came across PyTricia that can efficiently determine if an IP address (either IPv4 or IPv6) is in a certain subnet. I've patched this into peering_filters as follows:

diff --git a/peering_filters b/peering_filters
index ee6c03a..7666d83 100755
--- a/peering_filters
+++ b/peering_filters
@@ -23,6 +23,8 @@ import sys
 import time
 import yaml
 
+import pytricia
+
 
 def download(url):
     try:
@@ -94,8 +96,11 @@ with open("./cc-peers.yml") as f:
 ixp_map = {}
 router_map = {}
 for ixp in generic['ixp_map']:
-    ixp_map[ixp] = [ipaddr.IPNetwork(generic['ixp_map'][ixp]['ipv4_range']),
-                    ipaddr.IPNetwork(generic['ixp_map'][ixp]['ipv6_range'])]
+    #ixp_map[ixp] = [ipaddr.IPNetwork(generic['ixp_map'][ixp]['ipv4_range']),
+    #                ipaddr.IPNetwork(generic['ixp_map'][ixp]['ipv6_range'])]
+    ixp_map[ixp] = pytricia.PyTricia()
+    ixp_map[ixp].insert(generic['ixp_map'][ixp]['ipv4_range'], ixp)
+    ixp_map[ixp].insert(generic['ixp_map'][ixp]['ipv6_range'], ixp)
     router_map[ixp] = []
     for router in generic['ixp_map'][ixp]['present_on']:
         router_map[ixp].append(router)
@@ -287,11 +292,10 @@ for asn in peerings:
     else:
         continue
 
-    for session in sessions:
-        session_ip = ipaddr.IPAddress(session)
+    for session_ip in sessions:
         for ixp in ixp_map:
-            for subnet in ixp_map[ixp]:
-                if session_ip in subnet:
+            for im_circumventing_fixing_this_large_indentation_block in [1]:
+                if session_ip in ixp_map[ixp]: # pytricia lookup
                     print("found peer %s in IXP %s" % (session_ip, ixp))
                     print("must deploy on %s" % " ".join(router_map[ixp]))
                     description = peerings[asn]['description']

After profiling again:

Command line: ./peering_filters
Total duration: 27.7758s
File: ./peering_filters
File duration: 6.03189s (21.72%)
[..]
File: /home/mrngm/.local/lib/python3.9/site-packages/ipaddr.py
File duration: 0.525969s (1.89%)

(and now most of the time is in parsing YAML).

Is this something you would consider an interesting optimization? If there is any way I can verify that this patch does not break configuration, please let me know!

Graceful shutdown correctly implemented?

During the last maintenance the Graceful Shutdown community was enabled, but on the routes it wasn't clear right away if the community was set or not. Please double check if this is the case or not.

Switch to well known large communities for RPKI

We are using our own defined large communities for the different RPKI states (valid, invalid, unknown). There are well known large communities for the different RPKI states. To adhere to standards like this, we should update our configurations to start using those.
The well known large communities are:

valid 0x4300:0
unkown 0x4300:1
invalid 0x4300:2

Add check to not push an empty config

When the process of generating new configuration is done, Kees should not only check if the syntax for the vendor in question is correct, but it should also check if the new config actually does something. A config without any BGP peers, or no OSPF enabled interfaces can have a correct syntax, but the network will go down. This should also be checked.
When this check is added, it should also run in the Travis validation, when this is configured too.

Split generation of the config to make it more independant

The generation of the config should be more modular. This has as advantage that when 1 part fails, the other parts are generated without problems. For instance: when the generation of the eBGP filter fails, the static routes towards our members should be generated just fine.

Parts that could be made separate, as a quick idea:

eBGP filter en session config
Transit config
OSPF en iBGP config
static routes toward the members
BGP sessions toward the members
Misc stuff in the config generation

Make update-routers.sh less Coloclue specific

A lot of Coloclue specific things are hardcoded in update-routers.sh (hostnames, FQDNs, path to the needed SSH key). This should be more generic so other people can use Kees more easily.

Upgrade to Python 3

All the Python code is now written for Python 2, it is time to upgrade to Python 3

Add 100.64.0.0/10 to bogon list

Add Travis job for validation

When a new commit, PR or branch is merged into the master branch, a Travis job should run to validate if new code still is valid (no syntax errors and stuff like that) and if it still generates a valid Bird config (a config that has no syntax errors according to Bird).
So: generate-peer-config.sh and update-routers.sh check should be run by Travis.

Make the output directories configurable

The output directories are hardcoded into the code now, this should be configurable via environment variables or commandline arguments, so you can easily override it. This is needed to be able to create debug jobs, which don't interfere with the automatic production jobs. They should output the generated config in different directories.

Add support for AS-sets for members

A few members have an ASN, and thus also there own IP-space. They have created a AS-set to define this IP-space. We should add support for those AS-sets, so we don't have to update the allowed prefixes by hand.