Giter VIP home page Giter VIP logo

pycommunityid's Introduction

pycommunityid

This package provides a Python implementation of the open Community ID flow hashing standard.

It supports Python versions 2.7+ (for not much longer) and 3+.

example foobar

Installation

This package is available on PyPI, therefore:

pip install communityid

To install locally from a git clone, you can use also use pip, e.g. by saying

pip install -U .

Usage

The API breaks the computation into two steps: (1) creation of a flow tuple object, (2) computation of the Community ID string on this object. It supports various input types in order to accommodate network byte order representations of flow endpoints, high-level ASCII, and ipaddress objects.

Here's what it looks like:

import communityid

cid = communityid.CommunityID()
tpl = communityid.FlowTuple.make_tcp('127.0.0.1', '10.0.0.1', 1234, 80)

print(cid.calc(tpl))

This will print "1:mgRgpIZSu0KHDp/QrtcWZpkJpMU=".

The package includes three sample applications:

  • community-id, which calculates the ID directly for given flow tuples. It supports a small but growing list of parsers. Example:

    $ community-id tcp 10.0.0.1 10.0.0.2 10 20
    1:9j2Dzwrw7T9E+IZi4b4IVT66HBI=
    
  • community-id-pcap, which iterates over a pcap via dpkt and renders Community ID values for each suitable packet in the trace. This exercices the package's "low-level" API, using flow tuple values as you'd encounter them in a typical network monitor.

  • community-id-pcapfilter, which iterates over a pcap via dpkt and produces a pcap of only those packets whose Community IDs have a specific value, filtering out all others.

  • community-id-tcpdump, which takes tcpdump output on stdin and augments it with Community ID values on stdout. This exercices the package's "high-level" API, using ASCII representations of tuple values.

Testing

The package includes a unittest testsuite in the tests directory that runs without installation of the module. After changing into that folder you can invoke it e.g. via

python -m unittest communityid_test

or

nose2 -C --coverage ../communityid --coverage-report term-missing communityid_test

or by running ./communityid_test.py directly.

pycommunityid's People

Contributors

ckreibich avatar philhagen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pycommunityid's Issues

Community id generated using pycommunity id mismatch the one generated using suricata

Issue

I have a pcap, when i run suricata on it, it produces flows with cids
when I run zeek on it, and generate the cid of each zeek flow using pycommunityid library, some flows don't have the same cids produced by suricata


Steps to reproduce

here's the pcap i used: https://github.com/stratosphereips/StratosphereLinuxIPS/blob/develop/dataset/test7-malicious.pcap

i ran suricata using the following command on it
suricata -r test7-malicious.pcap

i ran zeek using the following cmd on it
zeek -C -r test7-malicious.pcap

for each line in the zeek conn.log output i ran the following script to get the cid of each flow

proto = flow.proto.lower()
cases = {
    'tcp': communityid.FlowTuple.make_tcp,
    'udp': communityid.FlowTuple.make_udp,
    'icmp': communityid.FlowTuple.make_icmp,
}
try:
    tpl = cases[proto](flow.saddr, flow.daddr, flow.sport, flow.dport)
    return self.community_id.calc(tpl)
except KeyError:
    return ''

now for example this flow produced by suricata:

{"timestamp": "2018-03-09T22:49:16.520001+0200", "flow_id": 1898491295854895, "event_type": "flow", "src_ip": "fe80:0000:0000:0000:00d2:4591:568e:c3d1", "src_port": 5353, "dest_ip": "ff02:0000:0000:0000:0000:0000:0000:00fb", "dest_port": 5353, "proto": "UDP", "app_proto": "failed", "flow": {"pkts_toserver": 13, "pkts_toclient": 0, "bytes_toserver": 5188, "bytes_toclient": 0, "start": "2018-03-09T22:49:16.553263+0200", "end": "2018-03-09T22:50:26.234272+0200", "age": 70, "state": "new", "reason": "timeout", "alerted": false}, "community_id": "1:JpepHprmBz0RFdlLGhEMO4jAPvA="}

is the same as this flow produced by zeek:

conn.log:{"ts":1520628556.553263,"uid":"CJwrIjmGopvQP6Gx1","id.orig_h":"fe80::d2:4591:568e:c3d1","id.orig_p":5353,"id.resp_h":"ff02::fb","id.resp_p":5353,"proto":"udp","service":"dns","duration":14.121544122695923,"orig_bytes":1892,"resp_bytes":0,"conn_state":"S0","local_orig":false,"local_resp":false,"missed_bytes":0,"history":"D","orig_pkts":7,"orig_ip_bytes":2228,"resp_pkts":0,"resp_ip_bytes":0,"orig_l2_addr":"68:5b:35:b1:55:93","resp_l2_addr":"33:33:00:00:00:fb"}

however, pycommunity id gives me this cid: 1:Ij3wBn8AhEgwlNMz41h3vXi0yL8= which doesn't match the one produced by suricata for the same flow


update

when I tried generating the cid using zeek's corelight plugin Corelight/CommunityID, I got the same uid as pycommunityid library

{"ts":1520628556.553263,"uid":"C0ADPg3q0T5H6xlzdb","id.orig_h":"fe80::d2:4591:568e:c3d1","id.orig_p":5353,"id.resp_h":"ff02::fb","id.resp_p":5353,"proto":"udp","service":"dns","duration":14.121544122695923,"orig_bytes":1892,"resp_bytes":0,"conn_state":"S0","local_orig":false,"local_resp":false,"missed_bytes":0,"history":"D","orig_pkts":7,"orig_ip_bytes":2228,"resp_pkts":0,"resp_ip_bytes":0,"orig_l2_addr":"68:5b:35:b1:55:93","resp_l2_addr":"33:33:00:00:00:fb","community_id":"1:Ij3wBn8AhEgwlNMz41h3vXi0yL8="}

i guess this means that suricata is the one doing something wrong, and not pycommunityid?

FlowTupleError port invalid for specific ports

Hi, I started to experiment with community ID and pycommunityid and I think that I found a bug in function in_nbo():

def in_nbo(self):
"""
Returns a copy of this tuple where the addresses and port are
rendered into NBO byte strings.
"""
saddr = self._addr_to_nbo(self.saddr)
daddr = self._addr_to_nbo(self.daddr)
if isinstance(self.sport, int):
sport = struct.pack('!H', self.sport)
else:
sport = self.sport
if isinstance(self.dport, int):
dport = struct.pack('!H', self.dport)
else:
dport = self.dport
return FlowTuple(self.proto, saddr, daddr, sport, dport, self.is_one_way)

  • The problem is with the creation of the new FlowTuple at the end of function,
  • exception will occur if sport or dport is from range 11569 - 11577:
    communityid.error.FlowTupleError: Destination port "b'-1'" invalid

You can test it with your sample application:
$ community-id tcp 10.0.0.1 10.0.0.2 10 11569

Number 11569 in hex is 0x2D31 and that is '-1' in ASCII. I think the problem is with this line in function is_port(val):

port = int(val)

  • bytes value of number 11569 is decoded as -1 and that is wrong
  • I think other big port numbers can be problematic because they can be represented as ASCII characters too
    • port with number 14392 is 0x3838 in hex and 88 in ASCII

I hope somebody will check this bug and will found a solution.

additional request-reply pair ICMP Extended Echo

At https://tools.ietf.org/html/rfc8335 an additional request-reply pair is documented.
ICMP field Type: Extended Echo Request. The value for ICMPv4 is 42. The value for ICMPv6 is 160
ICMP field Type: Extended Echo Reply. The value for ICMPv4 is 43. The value for ICMPv6 is 161

Since that wasn’t in the version 1 table, hashes have been using the code octet instead of the request-response peer type value.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.