Giter VIP home page Giter VIP logo

idaholab / malcolm Goto Github PK

View Code? Open in Web Editor NEW

This project forked from cisagov/malcolm

329.0 19.0 55.0 211.1 MB

Malcolm is a powerful, easily deployable network traffic analysis tool suite for full packet capture artifacts (PCAP files), Zeek logs and Suricata alerts.

Home Page: https://idaholab.github.io/Malcolm/

License: Other

Dockerfile 5.35% CSS 15.84% Shell 18.58% HTML 1.86% PHP 0.24% Python 42.47% Zeek 6.07% JavaScript 4.48% Ruby 3.81% Vim Script 0.01% PowerShell 1.04% Makefile 0.10% Perl 0.17%
network-security security arkime cybersecurity infosec network-traffic-analysis networksecurity networktrafficanalysis opensearch opensearch-dashboards pcap suricata zeek

malcolm's Introduction

Click here for information on the first-ever Malcolm community conference, Mal.Con '24!

Malcolm

Malcolm is a powerful network traffic analysis tool suite designed with the following goals in mind:

  • Easy to use – Malcolm accepts network traffic data in the form of full packet capture (PCAP) files and Zeek logs. These artifacts can be uploaded via a simple browser-based interface or captured live and forwarded to Malcolm using lightweight forwarders. In either case, the data is automatically normalized, enriched, and correlated for analysis.
  • Powerful traffic analysis – Visibility into network communications is provided through two intuitive interfaces: OpenSearch Dashboard, a flexible data visualization plugin with dozens of prebuilt dashboards providing an at-a-glance overview of network protocols; and Arkime (formerly Moloch), a powerful tool for finding and identifying the network sessions comprising suspected security incidents.
  • Streamlined deployment – Malcolm operates as a cluster of Docker containers – isolated sandboxes that each serve a dedicated function of the system. This Docker-based deployment model, combined with a few simple scripts for setup and run-time management, makes Malcolm suitable to be deployed quickly across a variety of platforms and use cases; whether it be for long-term deployment on a Linux server in a security operations center (SOC) or for incident response on a Macbook for an individual engagement.
  • Secure communications – All communications with Malcolm, both from the user interface and from remote log forwarders, are secured with industry standard encryption protocols.
  • Permissive license – Malcolm is comprised of several widely used open-source tools, making it an attractive alternative to security solutions requiring paid licenses.
  • Expanding control systems visibility – While Malcolm is great for general-purpose network traffic analysis, its creators see a particular need in the community for tools providing insight into protocols used in industrial control systems (ICS) environments. Ongoing Malcolm development will aim to provide additional parsers for common ICS protocols.

Although all the open-source tools that make up Malcolm are already available and in general use, Malcolm provides a framework of interconnectivity that makes it greater than the sum of its parts.

In short, Malcolm provides an easily deployable network analysis tool suite for full PCAP files and Zeek logs. While Internet access is required to build Malcolm, internet access is not required at runtime.

Documentation

See the Malcolm documentation.

Share your feedback

You can help steer Malcolm's development by sharing your ideas and feedback. Please take a few minutes to complete this survey ↪ (hosted on Google Forms) so we can understand the members of the Malcolm community and their use cases for this tool.

Copyright and License

Malcolm is Copyright 2024 Battelle Energy Alliance, LLC, and is developed and released through the cooperation of the Cybersecurity and Infrastructure Security Agency of the U.S. Department of Homeland Security.

Malcolm is licensed under the Apache License, version 2.0. See LICENSE.txt for the terms of its release.

Contact information of author(s):

[email protected]

malcolm's People

Contributors

0xflotus avatar 0xshaft03 avatar aglad-eng avatar aut0exec avatar cclauss avatar dependabot[bot] avatar jadams avatar jarscott1 avatar kkvarfordt avatar lgtm-migrator avatar melaniepierce avatar mmguero avatar n8hacks avatar njinx avatar obsidianknife avatar piercema avatar schallee avatar scott-jeffery avatar supcom234 avatar theenawman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

malcolm's Issues

Button for wiping Malcolm data "on the fly"

From Malcolm created by mmguero: cisagov#11

There is scripts/wipe.sh but it would be nice to have a GUI-ish way for wiping data on the fly. Maybe tie this in to Moloch's ES Indices tab, implement something to multi-select indices or pattern-select indices and delete everything with one click?

pcap file with malformed (too long) data is not indexed properly

From Malcolm created by mmguero: cisagov#7

see https://wiki.wireshark.org/SampleCaptures and search for c05-http-reply-r1.pcap.

This is malformed data, but it is discarded by Elasticsearch because it is too long. is this something to be concerned about? Maybe, maybe not. i just wanted to document it to see if we want to do something about it.

there is a truncate filter:

However, we don't want to do this on all fields as it would be expensive.

Very large pcaps don't get proccesed

Moved from cisagov#168 by @dajohn78

🐛 Summary

When placing a large pcap (50GB +) in the Malcolm upload folder it doesn't seem to get processed. Testing the same procedure with a 1GB pcap works just fine.

To reproduce

Steps to reproduce the behavior:

  1. Place a large pcap in the 'upload' folder. (Copy or via SFTP)
  2. The process pcap-monitor_1 renames (moves) the pcap to the 'processed' folder
  3. Nothing happens... elasticsearch_1 doesn't create or updates the index. No zeek logs created either. Size of elasticsearch index stays the same.

Expected behavior

Creating or updating the index, creation of Zeek logs, etc.

ISO installers don't use sane max swap partition size

In preseed_vmware.cfg

     .                                         \
     150% 150% 150% linux-swap                 \
       $defaultignore{ }                       \
       $lvmok{ }                               \
       in_vg { main } lv_name{ swap }          \
       method{ swap }                          \
       format{ }                               \
     .     

For large sizes of RAM this is stupid, as it will create an enormous swap partition. I need to set the maximum correctly.

ldap-analyzer doesn't work fully from Malcolm v2.0.5 and up

In Malcolm v2.0.5 we switched from GCC toolchain to LLVM toolchain. At this point, the ldap-analyzer stopped working correctly in that the ldap.log file is not being generated for write operations.

It turns out that if zeek and/or the plugin (not sure if it's one or the other) is compiled with a GCC toolchain it "works," but with an LLVM toolchain it doesn't.

I've distilled down a reproduction environment here: https://github.com/mmguero-dev/misc-debug/tree/main/zeek-ldap-analyzer

add CA certificate to elasticsearch jdk trust store for using self-signed SMTP servers

With the Alerting plugin in open distro for elasticsearch, it appears that using SSL/TLS with an SMTP server that uses a self-signed certificate (or any certificate with a custom CA) will not work.

It appears that such a CA can be imported by running a command such as:

keytool -importcert -file foobar.crt -alias "foobar" -keystore /usr/share/elasticearch/jdk/lib/security/cacerts -keypass changeit -storepass changeit -noprompt

I have not tested this yet on the elasticsearch container, but I believe this is how it should be done. We are already importing from ./nginx/ca-trust CA root certificates for LDAP server validation, I don't think there would be any problem with using that same folder to contain this certificate and use the contents for both purposes.

ISO installers result in blank screen when booting with BIOS

Greetings,

when building hedgehog, there are no obvious errors during the generation of the ISO, however when installing it on a host, it attempts to install Debian, and then fails with a blank screen (no video output) after installing grub. The resulting install does not boot at all.

ISO built on a host using Ubuntu 18.0.4, vagrant 2.2.10, and VirtualBox 5.2.42.

automated testing

From Malcolm created by mmguero: cisagov#26

Currently Malcolm's tested manually by me on a per-change basis. As the project matures, I need to look into implementing some kind of test framework that can be run overnight or something to ensure builds and functionality don't break without my knowing it.

"best guess" for identifying potential ICS/OT protocols

This outlines the new "best guess" feature for identifying potential ICS protocols.

There are many, many ICS (industrial control systems)/OT protocols and Malcolm parses a handful of them. A lot of them, particularly the more obscure or proprietary ones, are unlikely to ever be supported with a full parser. But it would be nice to identify more of them even without a full parser.

This feature involves a mapping/lookup file (in the Malcolm source under zeek/config/guess_ics_map.txt) and a zeek script (zeek/config/guess.zeek) that hooks on Zeek's connection close event and looks up the protocol (e.g., tcp or udp) and destination port and/or source port to make a "best guess" at whether a connection belongs to one of those protocols based on those values alone. Of course, this could mean a much higher false positive rate than usual, so these logs (which get written into bestguess.log) are only shown in their own dashboard (Best Guess under the ICS section of the dashboards navigation pane) with a disclaimer that they might have false positives. Values such as IP addresses, ports, or UID can be used to pivot to other dashboards to investigate further.

These are categorized by vendor, where possible.

As it's not likely that all users of Malcolm will want this enabled, the environment variable ZEEK_DISABLE_BEST_GUESS_ICS in docker-compose.yml is set to true by default, meaning the bestguess.log file won't be written. Clearing this value out (setting it to empty, '') will enable it.

kibana offline maps server not started

some change to the npm http-server package changed its location from /usr/bin to /usr/local/bin, causing the process to break as used in Malcolm. Need to reflect path in supervisord.conf config file in the kibana-helper image.

turning off AUTO_TAG feature disables tagging altogether

From Malcolm created by mmguero: cisagov#40

I ran across this the other day testing:

x-common-upload-variables: &common-upload-variables
  AUTO_TAG : 'false'

Setting AUTO_TAG to false in docker-compose.yml not only disables auto tagging, but also tags manually added via the upload web interface. That's not ideal.

transition from ElasticSearch to OpenSearch

This issue is for tracking the transition from ElasticSearch to the open source fork, OpenSearch.

Malcolm will be switching to the OpenSearch project as the basis of its search and analytics capabilities, mainly for two reasons:

  1. Elastic.co's decision to no longer release Elasticsearch and Kibana under an open source license
  2. Capabilities available under OpenSearch (and previously under Open Distro for Elasticsearch) that are only available with paid "premium" Elastic.co subscriptions (machine learning anomaly detection, alerting, reporting, etc.)

Historical context:

moloch Loading session packets No response for a long time

  • docker-compose pull
  • ./start

moloch Loading session packets No response for a long time

image

image

image

log

moloch_1         | Unsupported pcap file /data/pcap/processed/AUTOZEEK,AUTOCARVEall,test.pcapng link type 4294967295
moloch_1         | Sun, 01 Nov 2020 11:15:47 GMT aaa GET /moloch/session/201101-ywDxm2VLF9xDzIk-nIkIAf5a/packets?base=last&line=false&image=false&gzip=false&ts=false&decode=%7B%7D&packets=200&showFrames=false&showSrc=true&showDst=true - - bytes - ms
nginx-proxy_1    | 172.19.0.1 - aaa [01/Nov/2020:11:15:47 +0000] "GET /moloch/session/201101-ywDxm2VLF9xDzIk-nIkIAf5a/packets?base=last&line=false&image=false&gzip=false&ts=false&decode=%7B%7D&packets=200&showFrames=false&showSrc=true&showDst=true HTTP/1.1" 504 167 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Safari/605.1.15"
nginx-proxy_1    | 172.19.0.1 - - [01/Nov/2020:11:15:47 +0000] "GET /favicon.ico HTTP/1.1" 401 179 "https://localhost/moloch/session/201101-ywDxm2VLF9xDzIk-nIkIAf5a/packets?base=last&line=false&image=false&gzip=false&ts=false&decode=%7B%7D&packets=200&showFrames=false&showSrc=true&showDst=true" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Safari/605.1.15"

update Elastic to 7.9.2

Tracking updating elastic components from 7.6.2 to 7.9.2.

The unstable/development branch these commits are being done on before integration into the main branch can be found at https://github.com/mmguero-dev/Malcolm/tree/topic/elastic_new

  • elasticsearch updated and runs
  • logstash updated and runs (starts up faster actually)
  • filebeat (Malcolm) updated and runs
  • kibana updated and runs
  • kibana comments plugin builds/installs/works
  • kibana swimlane plugin
  • kibana sankey plugin
  • kibana elastalert plugin
  • kibana drilldown plugin
  • test elastalert
  • test moloch
  • protologbeat ("heatbeat") builds now, have not tested on Hedgehog yet
  • metricbeat tested on hedgehog
  • auditbeat tested on hedgehog
  • filebeat tested on hedgehog
  • filebeat (syslog) tested on hedgehog

Zeek log severity scoring

From Malcolm created by mmguero: cisagov#125

The idea is that we assign a severity rating to logs (all logs? some logs?)

So, imagine 1 - not severe at all (blue or green), 5 - super severe (red)

in Logstash enrichment we'd do stuff like:

cleartext password - 5
connection to naughty country - 5
certain notices - 5
insecure or old versions of protocols - 4
file transfers of certain mime types - 3
connection within subnet - 1
connection to other subnet - 2
connection to outside world - 3
etc.

Of course those are just examples. I'd need to hammer out a real list.

Then in some of the dashboards, we can have "number of red events" "number of green events" etc.

rework kibana dashboards to use moloch fields where possible

From Malcolm created by mmguero: cisagov#65

Some data in the database exists in two fields: for example, zeek_gquic.user_agent and quic.useragent.

The Kibana dashboards right now generally use the zeek versions. It would be better to rework them to use the moloch fields, as it would allow more data to be visualized in kibana (moloch sessions vs. just zeek logs).

LDAP Bind credentials world readable in docker

created from cisagov#171

🐛 Summary

What's wrong? Please be specific.
LDAP Bind credentials in this file are readable by anyone. Can we put some permissions on the file when it gets created in the nginx entrypoint?

-rw-r--r-- /var/lib/docker/overlay2/*****/diff/etc/nginx/nginx_ldap_rt.conf

To reproduce

Steps to reproduce the behavior:

  1. Standard build with LDAP
  2. Grep overlay2 for bind credentials

Expected behavior

Readable only by user running nginx

What did you expect to happen that didn't?

Any helpful log output or screenshots

Paste the results here:

Add any screenshots of the problem here.

configuration/documentation for using another OpenSearch cluster rather than local docker

From Malcolm created by mmguero: cisagov#16

In some cases it will make more sense for people to use their own elasticsearch OpenSearch deployment rather than Malcolm's dockerized one. For example, in order to do a larger scale-out implementation with multiple data notes, etc.

I'm going to trying things out for this in a personal branch dedicated to this topic. Specifying the connection parameters (IP/port) should be pretty easy once things are normalized into a single source of environment variables in the compose file. I think the trick will be how to specify authentication information for all of the clients. This will include:

  • arkime
  • logstash
  • dashboards
  • dashboards-helper
  • pcap-monitor
  • api

write contributor's guide for source code contributions/modifications

should cover:

  • Zeek:
    • adding/enabling new zeek plugins
    • tweaking local.zeek
  • Logstash
    • parsing new Zeek log files
    • new enrichments
    • completely new data sources/log file types
    • how the dynamic pipelines stuff works
  • Moloch
    • adding new fields to WISE
  • Kibana
    • adding new dashboards
  • PCAP
    • adding a new PCAP processor
  • file carving
    • adding a new carved file scanner

I'll add more to this list if I think about it.

OpenSearch to Splunk export/searching capabilities

NetBox: integrate into Malcolm for asset inventory/management

From Malcolm created by robefernandez: cisagov#113

Congratulations for the project, it's really useful and easy to setup in just minutes using the scripts and docker compose.

I've just deployed the solution for testing it so I'm actually a newbie and I have to spend more time to discover all the features but I have a question that will be decisive to continue using it or not by the moment:
Does it have asset inventory capabilities to list all the devices on the network?

I set to true the property LOGSTASH_OUI_LOOKUP (Logstash will map MAC addresses to vendors for all source and destination MAC addresses when analyzing Zeek logs).

Is there any dashboard or any place where we can obtain a list of the network devices?

Best regards.

provide more fine-tuned optimization variables in control_vars.conf for node.cfg to be used in zeekdeploy.sh

@ObsidianKnife suggested better ways to fine-tune node.cfg for performance as it's created by zeekdeploy.sh on the hedgehog. See cisagov#158

I'm going to for now add the following to control_vars.conf:

export ZEEK_PIN_CPUS_LOGGER=
export ZEEK_PIN_CPUS_MANAGER=
export ZEEK_PIN_CPUS_PROXY=
# zeekdeploy.sh will also check and use (if present):
#   ZEEK_PIN_CPUS_WORKER_1 .. ZEEK_PIN_CPUS_WORKER_n
# where n is the number of capture interfaces

in addition, this existed in there previously:

export ZEEK_LB_PROCS=1
export ZEEK_LB_METHOD=custom

These variables will be used in creating node.cfg to add optional (only created if defined) pin_cpus sections for logger, manager, and proxy, and for each worker (1 .. n where n is the number of capture interfaces). Additionally, for the workers' lb_procs values, I will use the following order of preference, if they exist:

  1. ZEEK_LB_PROCS_WORKER_1 .. ZEEK_LB_PROCS_WORKER_n
  2. the number of pinned CPUs in ZEEK_PIN_CPUS_WORKER_1 .. ZEEK_PIN_CPUS_WORKER_n
  3. the value in ZEEK_LB_PROCS (defaults to 1)

Zeek Intel Framework

From Malcolm created by cyamal1b4: cisagov#131

Greetings, so I've been up and down in the docs for Zeek and understand decently well how to create Intel in Zeek, however, that is largely aligned with a normal deployment of Zeek. I have loaded the required policies and pointed zeek (in local.zeek) to my local bro formatted file with an indicator. I tested everything in the TryZeek page with the same pcap, local.zeek changes, as well as the same bro intel file. However, when I replicate I cannot get Malcolm (under the Kibana-Zeek Intel or Notice dashboards) to pick up on my Intel hit. I have noticed some differences in the deployment with Malcolm so I figured it would be best to ask the developer directly. Thanks for your support and contribution!!

LDAP parser broken (ldap.spicy:393 unset optional value) if built from source since May 31

I noticed that the LDAP plugin was broken for some of my PCAPs and did a git bisect on it. The commit zeek/spicy-analyzers@6896d49 works fine, bit the commit zeek/spicy-analyzers@a44bc7c gives this error message for ldap.spicy:393

1538420480.118397 analyzer error in /devel/github/mmguero-dev/spicy-analyzers/analyzer/protocol/ldap/ldap.spicy, line 393: unset optional value
1538420480.202406 analyzer error in /devel/github/mmguero-dev/spicy-analyzers/analyzer/protocol/ldap/ldap.spicy, line 393: unset optional value

I'm discussing it with the zeek folks in slack, but i'm thinking there's a bug in the switch-level parse-from (see added, I believe, with zeek/spicy@0895997 )

This is the PCAP I used to discover the issue.

For now I have reverted the commit (zeek/spicy-analyzers@a44bc7c) in my fork (mmguero-dev/spicy-analyzers@54bbd2d73f70b897902d399001246bc0c559f3d2) and am switching Malcolm's build instructions to build from that while we figure out why it broke upstream. That should be a temporary fix, anyway.

provide browser access to Zeek-extracted files directory (quarantined, preserved)

Zeek-extracted files can be preserved/"quarantined" based on scanning results, but there's not a real convenient way to get at those files.

I've added optional environment variables for a new feature:

  • EXTRACTED_FILE_HTTP_SERVER_ENABLE – if set to true, the directory containing Zeek-extracted files will be served over HTTP at ./extracted-files/ (e.g., https://localhost/extracted-files/ if you are connecting locally)

  • EXTRACTED_FILE_HTTP_SERVER_ENCRYPT – if set to true, those Zeek-extracted files will be AES-256-CBC-encrypted in an openssl enc-compatible format (e.g., openssl enc -aes-256-cbc -d -in example.exe.encrypted -out example.exe)

  • EXTRACTED_FILE_HTTP_SERVER_KEY – specifies the AES-256-CBC decryption password for encrypted Zeek-extracted files; used in conjunction with EXTRACTED_FILE_HTTP_SERVER_ENCRYPT

The encryption is more for safety's sake than anything (as the files may contain live malware). It's a very no-frills HTTP server. It's disabled by default.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.