Giter VIP home page Giter VIP logo

spamscope / spamscope Goto Github PK

View Code? Open in Web Editor NEW
279.0 20.0 59.0 6.39 MB

Fast Advanced Spam Analysis Tool

Home Page: https://pypi.python.org/pypi/SpamScope

License: Apache License 2.0

Python 95.27% Clojure 0.19% Shell 0.10% Dockerfile 0.23% Makefile 0.89% Jinja 3.32%
security mail-analyzer spam-analyzer streamparse apache-storm application-security python outlook docker-image docker ansible-playbook ansible smtp dialect spamscope

spamscope's Introduction

PyPI version Build Status Coverage Status BCH compliance

SpamScope

Overview

SpamScope is an advanced spam analysis tool that use Apache Storm with streamparse to process a stream of mails. To understand how SpamScope works, I suggest to read these overviews:

In general the first step is run Apache Storm, then you can run the topologies on it. SpamScope has some topologies in topologies folder, but you can make others topologies.

Schema topology

Apache 2 Open Source License

SpamScope can be downloaded, used, and modified free of charge. It is available under the Apache 2 license.

Support the project

Dogecoin: DAUbDUttkf8WN1kwP9YYQQKyEJYY2WWtEG

Donate with Bitcoin

Donate

What Does SpamScope do?

SpamScope gets the raw emails (both RFC822 and Outlook formats) in input and returns an JSON object. Then it extracts urls and attachments (if they are zipped extracts the content files). All informations are saved in JSON objects. This is the first analysis. After that SpamScope runs a phishing module, that gives a phishing score to the emails.

Then you can enable/disable post processing modules, that connect SpamScope with third party tools. There are three main categories:

  • raw emails analysis
  • attachments analysis
  • sender emails analysis

It's possible to add new modules in these three categories, if you want connect SpamScope with others tools.

Raw emails analysis

These modules (see here) analyze the raw emails:

  • SMTP dialect
  • SpamAssassin

Attachments analysis

These modules (see here) analyze the attachments of emails:

  • Apache Tika
  • Store sample on disk (as default SpamScope saves samples in JSON objects)
  • Thug
  • VirusTotal
  • Zemana

Sender emails analysis

SpamScope can detects the exact sender IP and then it can analyze it (see here):

  • Shodan
  • VirusTotal

Why should I use SpamScope

  • It's very fast: the job is splitted in functionalities that work in parallel.
  • It's flexible: you can choose what SpamScope has to do.
  • It's distributed: SpamScope uses Apache Storm, free and open source distributed realtime computation system.
  • It makes JSON output that you can save where you want.
  • It's easy to setup: there are docker images and docker-compose ready for use.
  • It's integrated with Apache Tika, VirusTotal, Thug, Shodan and SpamAssassin (for now).
  • It's free and open source (for special functions you can contact me).
  • It can analyze Outlook msg.

Distributed

SpamScope uses Apache Storm that allows you to start small and scale horizontally as you grow. Simply add more workers.

Flexibility

You can choose your mails input sources (with spouts) and your functionalities (with bolts).

SpamScope comes with the following bolts:

  • tokenizer splits mail in token like headers, body, attachments and it can filter emails, attachments and ip addresses already seen
  • phishing looks for your keywords in email and connects email to targets (bank, your customers, etc.)
  • raw_mail is for all third party tools that analyze raw mails like SpamAssassin
  • attachments analyzes all mail attachments and uses third party tools like VirusTotal
  • network analyzes all sender ip addresses with third party tools like Shodan
  • urls extracts all urls in email and attachments
  • json_maker and outputs make the json report and save it

Store where you want

You can build your custom output bolts and store your data in Elasticsearch, MongoDB, filesystem, etc.

Build your topology

With streamparse tecnology you can build your topology in Python, add and/or remove spouts and bolts.

API

For now SpamScope doesn't have its own API, because it isn't tied to any tecnology. If you use Redis as spout (input), you'll use Redis API to put mails in topology. If you use Elasticsearch as output, you'll use Elasticsearch API to get results.

It's possible to develop a middleware API that it talks with input, output and changes the configuration, but now there isn't.

SpamScope on Web

Authors

Main Author

Fedele Mantuano (LinkedIn: Fedele Mantuano)

Requirements

For operating system requirements you can read Ansible playbooks, that go into details.

For Python requirements you can read:

Thug is another optional requirement, that it's not in requirements. See Thug section for more details.

Apache Storm

Apache Storm is a free and open source distributed realtime computation system.

streamparse

streamparse lets you run Python code against real-time streams of data via Apache Storm.

mail-parser

mail-parser is the parsing for raw email of SpamScope.

Faup

Faup stands for Finally An Url Parser and is a library and command line tool to parse URLs and normalize fields.

rarlinux (optional)

rarlinux unarchives rar file.

SpamAssassin (optional)

SpamScope can use SpamAssassin an open source anti-spam to analyze every mails.

Apache Tika (optional)

SpamScope can use Apache Tika to parse every attachments. The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). To use Apache Tika in SpamScope you must install tika-app-python with pip and Apache Tika.

Thug (optional)

From release v1.3 SpamScope can analyze Javascript and HTML attachments with Thug. If you want to analyze the attachments with Thug, follow these instructions to install it. Enable it in attachments section of main configuration file.

What is Thug? From README project:

Thug is a Python low-interaction honeyclient aimed at mimicing the behavior of a web browser in order to detect and emulate malicious contents.

You can see a complete SpamScope report with Thug analysis here.

Thug analysis can be very slow and you can have heartbeat timeout errors in Apache Storm. To avoid any issue set supervisor.worker.timeout.secs:

nr. user agents * timeout_thug < supervisor.worker.timeout.secs

The best value for threshold is 1.

VirusTotal (optional)

It's possible add to results (for mail attachments and sender ip address) the VirusTotal report. You need a private API key.

Shodan (optional)

It's possible add to results the Shodan report for sender ip address. You need a private API key.

Elasticsearch (optional)

It's possible to store the results in Elasticsearch. In this case you should install elasticsearch package.

Redis (optional)

It's possible to store the results in Redis. In this case you should install redis package.

Configuration

Read the example of main configuration file. The default value where SpamScope will search the configuration file is /etc/spamscope/spamscope.yml, but it's possible to set the environment variable SPAMSCOPE_CONF_FILE:

$ export SPAMSCOPE_CONF_FILE=/etc/spamscope/spamscope.yml

When you change the configuration file, SpamScope automatically reloads the new changes.

Installation

You can use:

Topologies

SpamScope comes with six topologies:

If you want submit SpamScope topology use spamscope-topology submit tool. For more details see SpamScope cli tools:

$ spamscope-topology submit --topology {spamscope_debug,spamscope_elasticsearch,spamscope_redis}

It's possible to change the default settings for all Apache Storm options. I suggest to change these options:

  • topology.tick.tuple.freq.secs: reload configuration of all bolts
  • topology.max.spout.pending: Apache Storm framework will then throttle your spout as needed to meet the topology.max.spout.pending requirement
  • topology.sleep.spout.wait.strategy.time.ms: max sleep for emit new tuple (mail)

You can use spamscope-topology submit to do these changes.

Important

If you are using Elasticsearch output, I suggest you to use Elasticsearch templates that comes with SpamScope.

Unittest

SpamScope comes with unittests for each modules. In bolts and spouts there are no special features, all intelligence is in external modules. All unittests are in tests folder.

To have complete tests you should set the followings enviroment variables:

$ export THUG_ENABLED=True
$ export VIRUSTOTAL_ENABLED=True
$ export VIRUSTOTAL_APIKEY="your key"
$ export ZEMANA_ENABLED=True
$ export ZEMANA_APIKEY="your key"
$ export ZEMANA_PARTNERID="your partner id"
$ export ZEMANA_USERID="your userid"
$ export SHODAN_ENABLED=True
$ export SHODAN_APIKEY="your key"
$ export SPAMASSASSIN_ENABLED=True

Output example

This is a raw email that I analyzed with SpamScope:

This is another example with Thug analysis.

Screenshots

Apache Storm

SpamScope

SpamScope Topology

SpamScope Map

spamscope's People

Contributors

antoinet avatar fedelemantuano avatar sylencecc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spamscope's Issues

New picking up emails and not recognized index in kibana

I have installed and configured everything using docker-compose.

I have no errors in the storm UI.

I assume the problem is that no mail seems to be picked up as the example email file remains in the folder, and the Kibana instance doesnt recognize the suggested index in my config.file.

I have placed a raw email example as located here in the "/mnt/mails" of my host.

my .env file looks like this:

CLUSTER_NAME=spamscope-cluster
DOCKER_MAILS_FOLDER=/mnt/mails
ELASTIC_DATA=/usr/share/elasticsearch/data
ELASTIC_MEM_LIMIT=2g
ELK_BIND_IP=127.0.0.1
ELK_TAG=5.6.3
HEAP_SIZE=1024m
HOST_MAILS_FOLDER=/mnt/mails
HOST_SPAMSCOPE_CONF=/etc/spamscope/
KIBANA_MEM_LIMIT=2g
NET_NAME=esnet
NODE_NAME=spamscope
SPAMSCOPE_BIND_IP=127.0.0.1
SPAMSCOPE_IMAGE_NAME=fmantuano/spamscope-elasticsearch
SPAMSCOPE_MEM_LIMIT=4g

Where to start Troubleshooting ?

Thanks for your time.

SpamAssassin returns empty dictionary

With certain emails, the output of Spamscope shows SpamAssassin as a empty Dictionary. If I run the email through the Spamassassin CLI with spamassassin -t , it parses it fine.

Is there any reason why Spamscope is returning a empty dictionary when it should not?

Thank you

Consider swapping out tika-app with tika-python

The Tika Python library uses the REST server (which is faster than CMD line calls in Java to Tika APP since the REST server doesn't need to reload Tika config and the JVM each time). In addition you don't need to worry about the location of the Tika jar file (and install it separately). It will manage all that for you.

Looks like you would just update requirements.txt to use pip install tika, and then make whatever necessary updates. If you want I can send a PR.

Split actual output in: JSON mails and JSON attachments

Split result in two parts:

  • mail result with all fields except details of attachments (only hashes)
  • attachment result with all attachment details

Store only a sample for hash and attach Tika and Virustotal analysis only a time for hash.

Sender Ip is always NULL

Question:

I have noticed that the Sender IP is always Null in the JSON output. The sender IP is in fact in the original email. Is there a way I can change this or is it expected behavior? I would like to add more lookups (other than Virustotal and shodan) but want to make Im looking in the right place.

Sometimes it is also in the "Return-Path" header.

Thank you

Configuration defaults will be used due to OSError

Errors appearing under the spamscope_debug worker.log

2018-10-11 08:54:22.519 phishing Thread-42 [INFO] /opt/spamscope/venv/local/lib/python2.7/site-packages/astropy/config/configuration.py:541: ConfigurationMissingWarning: Configuration defaults will be used due to OSError:Could not find unix home directory to search for astropy config dir on None
warn(ConfigurationMissingWarning(msg))

2018-10-11 08:54:22.520 phishing Thread-44 [INFO] /opt/spamscope/venv/local/lib/python2.7/site-packages/astropy/config/configuration.py:541: ConfigurationMissingWarning: Configuration defaults will be used due to OSError:Could not find unix home directory to search for astropy config dir on None
warn(ConfigurationMissingWarning(msg))

2018-10-11 08:54:22.520 phishing Thread-43 [INFO] /opt/spamscope/venv/local/lib/python2.7/site-packages/astropy/config/configuration.py:541: ConfigurationMissingWarning: Configuration defaults will be used due to OSError:Could not find unix home directory to search for astropy config dir on None
warn(ConfigurationMissingWarning(msg))

2018-10-11 08:54:22.520 phishing Thread-37 [INFO] /opt/spamscope/venv/local/lib/python2.7/site-packages/astropy/config/configuration.py:541: ConfigurationMissingWarning: Configuration defaults will be used due to OSError:Could not find unix home directory to search for astropy config dir on None
warn(ConfigurationMissingWarning(msg))

2018-10-11 08:54:22.522 phishing Thread-41 [INFO] /opt/spamscope/venv/local/lib/python2.7/site-packages/astropy/config/configuration.py:541: ConfigurationMissingWarning: Configuration defaults will be used due to OSError:Could not find unix home directory to search for astropy config dir on None
warn(ConfigurationMissingWarning(msg))

2018-10-11 08:54:22.525 phishing Thread-38 [INFO] /opt/spamscope/venv/local/lib/python2.7/site-packages/astropy/config/configuration.py:541: ConfigurationMissingWarning: Configuration defaults will be used due to OSError:Could not find unix home directory to search for astropy config dir on None
warn(ConfigurationMissingWarning(msg))

2018-10-11 08:54:23.509 phishing Thread-39 [INFO] /opt/spamscope/venv/local/lib/python2.7/site-packages/astropy/config/configuration.py:541: ConfigurationMissingWarning: Configuration defaults will be used due to OSError:Could not find unix home directory to search for astropy config dir on None
warn(ConfigurationMissingWarning(msg))

2018-10-11 08:54:23.509 phishing Thread-40 [INFO] /opt/spamscope/venv/local/lib/python2.7/site-packages/astropy/config/configuration.py:541: ConfigurationMissingWarning: Configuration defaults will be used due to OSError:Could not find unix home directory to search for astropy config dir on None
warn(ConfigurationMissingWarning(msg))

--

Unsure about data input

Hi, I'm a student and I wanted to try your spamscope as it looks comprehensive at processing bulks of mail and stripping them down. Its the latest version of the spamscope available on your Git as well

I'm a bit confused on how to parse in data into this though. I have a docker image with the apache storm running and using the spamscope-debug topology to store the output on file system.

From there, I'm not very sure of how to parse in data into it, I have some email headers that I want to parse into it for processing. I understand it has to do with apache storm spouts but I've never used it before and some guidance would be appreciated! Would it be possible for it to take in a set of email files located in a folder for example

Thank you in advance also! :)

image

Unable to convert Float

Unable to convert the value to float. I added a try and except around it to fix it. For your awareness.

Traceback (most recent call last): File "/opt/spamscope/venv/local/lib/python2.7/site-packages/pystorm/component.py", line 488, in run self._run() File "/opt/spamscope/venv/local/lib/python2.7/site-packages/pystorm/bolt.py", line 197, in _run self.process(tup) File "/var/lib/storm/supervisor/stormdist/spamscope_debug-1-1539177259/resources/bolts/raw_mail.py", line 50, in process p(self.conf[p.__name__], raw_mail, mail_type, results) File "/var/lib/storm/supervisor/stormdist/spamscope_debug-1-1539177259/resources/modules/mails/post_processing.py", line 93, in spamassassin results["spamassassin"] = spamassassin[mail_type](raw_mail) File "/var/lib/storm/supervisor/stormdist/spamscope_debug-1-1539177259/resources/modules/mails/spamassassin_analysis.py", line 90, in report_from_file return obj_report(mail) File "/var/lib/storm/supervisor/stormdist/spamscope_debug-1-1539177259/resources/modules/mails/spamassassin_analysis.py", line 56, in obj_report details = convert_ascii2json(t) File "/var/lib/storm/supervisor/stormdist/spamscope_debug-1-1539177259/resources/modules/mails/spamassassin_analysis.py", line 141, in convert_ascii2json "pts": float(row[0]), ValueError: could not convert string to float: [SPF

Manage Rejecting mapping update to in Elasticsearch bolt

Manage Rejecting mapping update to in bulk indexing:

2018-10-08 14:16:17.276 o.a.s.d.executor Thread-46 [ERROR]
java.lang.Exception: Shell Process Exception: Python BulkIndexError raised while processing Tuple Tuple(id=u'9016744204847506491', component=u'__system', stream=u'__tick', task=-1, values=(60,))
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/pystorm/component.py", line 488, in run
    self._run()
  File "/usr/local/lib/python2.7/dist-packages/pystorm/bolt.py", line 193, in _run
    self.process_tick(tup)
  File "/hadoop/storm/supervisor/stormdist/spamscope_elasticsearch-1-1538989492/resources/bolts/output_elasticsearch.py", line 106, in process_tick
    self.flush()
  File "/hadoop/storm/supervisor/stormdist/spamscope_elasticsearch-1-1538989492/resources/bolts/output_elasticsearch.py", line 60, in flush
    helpers.bulk(self._es, self._mails)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 257, in bulk
    for ok, item in streaming_bulk(client, actions, *args, **kwargs):
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 192, in streaming_bulk
    raise_on_error, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/helpers/__init__.py", line 137, in _process_bulk_chunk
    raise BulkIndexError('%i document(s) failed to index.' % len(errors), errors)
BulkIndexError: (u'2 document(s) failed to index.', [{u'index': {u'status': 400, u'_type': u'analysis', u'_index': u'spamscope_mails-2018.10.08', u'error': {u'reason': u'Rejecting mapping update to [spamscope_mails-2018.10.08] as the final mapping would have more than 1 type: [_doc, analysis]', u'type': u'illegal_argument_exception'}, u'_id': u'JVqbU2YBiKy7cYIvRinK', u'data': {u'return-path': u

Java errors in Storm UI after installation

Receiving these errors once installation was complete via ansible: Looks like I didnt install something or forgot to do a step in installation.


java.lang.RuntimeException: org.apache.storm.multilang.NoOutputException: Pipe to subprocess seems to be broken! No output read. Serializer Exception: /opt/spamscope/venv/local/lib/python2.7/site-pack
--

Any help would be appreciated!

Exception in phishing analysis for mail with mutiple subject headers

When attempting to analyse a mail with multiple subject headers such as the one in this gist (which is sort of invalid, but may happen anyway) with the phishing bolt bolts/phishing.py, the following exception occurs:

  File "/usr/local/lib/python3.6/dist-packages/pystorm/component.py", line 488, in run
    self._run()
  File "/usr/local/lib/python3.6/dist-packages/pystorm/bolt.py", line 197, in _run
    self.process(tup)
  File "/data/supervisor/stormdist/spamscope_analysis-1-1617302134/resources/bolts/phishing.py", line 92, in process
    self._mails.pop(sha256_random))
  File "/data/supervisor/stormdist/spamscope_analysis-1-1617302134/resources/bolts/phishing.py", line 71, in _phishing
    subject_keys=self.subject_keys)
  File "/data/supervisor/stormdist/spamscope_analysis-1-1617302134/resources/modules/mails/phishing.py", line 147, in check_phishing
    if swt(subject, subject_keys):
  File "/data/supervisor/stormdist/spamscope_analysis-1-1617302134/resources/modules/utils.py", line 196, in search_words_in_text
    text = text.lower()
AttributeError: 'list' object has no attribute 'lower'

The reason here being that in such a case the underlying mail-parser returns a list with all encountered subject values instead of a string:

>>> m = mailparser.parse_from_string("...")
>>> m.mail_partial.get('subject')
['195.133.49.168 e HMUth', 'Potenzmittel GRATIS testen   ๐Ÿ”ฅ    ๐Ÿ”ฅ    ๐Ÿ”ฅ']

Serializer Exception & Pipe Broken

Fresh docker image of spamscope and it brings this error.

org.apache.storm.multilang.NoOutputException: Pipe to subprocess seems to be broken! No output read. Serializer Exception: /usr/local/lib/python2.7/dist-packages/astropy/config/configuration.py

I have tried using the topology debug and the debug-iter already and both have this error. Upon submitting a mail to be analysed, the mail is analysed, put into /tmp/failed and this error still remains there. No output is received.

Help would be appreciated on this matter.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.