Giter VIP home page Giter VIP logo

binaryalert's Introduction

BinaryAlert: Serverless, Real-Time & Retroactive Malware Detection

Build Status

Coverage Status

Documentation Status

Slack Channel

BinaryAlert Logo

BinaryAlert is an open-source serverless AWS pipeline where any file uploaded to an S3 bucket is immediately scanned with a configurable set of YARA rules. An alert will fire as soon as any match is found, giving an incident response team the ability to quickly contain the threat before it spreads.

Read the documentation at binaryalert.io!

binaryalert's People

Contributors

austinbyers avatar fusionrace avatar goochi1 avatar jacknagz avatar jalewis avatar ljharb avatar mmwtsn avatar mtmcgrew avatar ryandeivert avatar ryxias avatar sid77 avatar twaldear avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

binaryalert's Issues

unit tests failing with latest yara rules

Background

It looks like the latest build of neo23x0's yara ruleset is breaking this build? Based on what i've been able to find it looks like there is possibly some type of version mismatch with the yara-python packages used. Any time a yara rule has a condition that calls pe.imphash the unit tests fail on build_analyzer and compile_rules with [yara.syntaxerror invalid field name"imphash"]
I've tried to clone a fresh copy of everything and rebuild from scratch, but I get the same error. I've also tried to pull the latest yara repos down, but no joy there either.

Has anyone successfully implemented newly released yara rules on this build?

Add support for `hash` YARA module

The pre-built yara-python_3.6.3.zip does not have support for the Hash module. Trying to add YARA rules with the hash module result in the following Lambda error when analyzing in production:

internal error: 34: Error
Traceback (most recent call last):
File "/var/task/main.py", line 76, in analyze_lambda_handler
with binary_info.BinaryInfo(os.environ['S3_BUCKET_NAME'], s3_key, ANALYZER) as binary:
File "/var/task/binary_info.py", line 57, in __enter__
self.download_path, original_target_path=self.observed_path)
File "/var/task/yara_analyzer.py", line 52, in analyze
return self._rules.match(target_file, externals=self._yara_variables(original_target_path))
yara.Error: internal error: 34

Enforce test coverage

Use coveralls and/or the coverage pip package to enforce a minimum level of test coverage in CI

Mock out loggers during unit tests

There should be no extraneous print statements during unit tests; these can be mocked out. As an added benefit, the mocked loggers can actually verify that the correct logs were printed

Analyzers should remove invalid binaries

Analyzers can process binaries in batches, but the whole batch will fail if any binary could not be analyzed or downloaded (e.g. timeout or file not found).

Analyzers should log an error and continue, possibly removing the binary from SQS

support for terraform 11

Background

The latest version of terraform is 11.0. When I attempt a deploy I get:

$ ./manage.py deploy
......................................................................

Ran 70 tests in 3.314s

OK
Creating analyzer deploy package...
Creating batcher deploy package...
Creating dispatcher deploy package...
Initializing modules...

  • module.binaryalert_downloader

  • module.binaryalert_batcher

  • module.binaryalert_dispatcher

  • module.binaryalert_analyzer
    The currently running version of Terraform doesn't meet the
    version requirements explicitly specified by the configuration.
    Please use the required version or update the configuration.
    Note that version requirements are usually set for a reason, so
    we recommend verifying with whoever set the version requirements
    prior to making any manual changes.

    Module: root
    Required version: ~> 0.10.4
    Current version: 0.11.0

Desired Change

Would it be possible to convert the terraform scripts to be able to run with version 0.11.0 ?

Cheers.

Batcher fails if the bucket is empty

If the S3 bucket is completely empty, the batcher raises an exception.
This should be fixed, and unit tests should prevent this from happening again

Fail without a name prefix

Every deployment of BinaryAlert needs a unique name prefix in terraform.tfvars. While the instructions say to include a prefix, if none is provided, it defaults to "". An attempt to deploy will nearly succeed, failing only if someone else tried to deploy with an empty prefix.

Instead, manage.py should check for the name prefix and refuse to deploy if no unique name is provided. Alternatively, a random prefix could be auto-generated.

Use higher-level boto3 resources

The Lambda code usually uses a raw boto3.client instead of the higher-level resources that are available. For example, boto3.resource('dynamodb').Table(...).query could be used instead of the (more complicated) boto3.client('dynamodb').query in the Analyzer

internal error: 34: Error

Note: This issue was created for posterity purposes only.

Usually this error will happen if one of your YARA rules is using a module that isn’t yet supported.

Ex: import "hash" - hash isn't supported (yet)

If you're not doing this, file a new bug as it's likely something else :)

Support for multiple parser/analyzer types

Enhance the Dispatcher to support multiple types of "Analyser" lambdas based on the type of file to be processed. This would support a generalized file analysis platform that could handle binary files in multiple ways (YARA or other static/dynamic analysis) as well as other forensic artifacts (configuration files, memory captures, etc).

One possible approach would be to have the Dispatcher retrieve S3 metadata for each object. A key:value pair would define the specific parser to use (by name, or perhaps substring of arn). Dispatcher would use the Lambda API (ListFunctions) to discover which lambdas it had access to and could support. It would then dispatch the file to one (or multiple) lambdas that matched the metadata and were available.

yara.Error: could not map file into memory

Some users are seeing the following error in the analyzer Lambda logs:

could not map file "/tmp/binaryalert_UUID" into memory: Error
Traceback (most recent call last):
File "/var/task/main.py", line 76, in analyze_lambda_handler
with binary_info.BinaryInfo(os.environ['S3_BUCKET_NAME'], s3_key, ANALYZER) as binary:
File "/var/task/binary_info.py", line 57, in __enter__
self.download_path, original_target_path=self.observed_path)
File "/var/task/yara_analyzer.py", line 52, in analyze
return self._rules.match(target_file, externals=self._yara_variables(original_target_path))
yara.Error: could not map file "/tmp/binaryalert_UUID" into memory

I have not been able to reproduce this locally, even with 20,000 YARA rules scanning a 10G file. Some theories:

  • The Lambda analyzers need more memory
  • Lambda handles virtual memory differently; YARA consumes tons of virtual memory even though the actual memory usage is fairly efficient

Use type annotations

Python 3.6 has support for type annotations, which allows IDEs and test frameworks to explicitly verify types.

We should also explore enforcing type-checking via mypy or some other library until the Python interpreter itself has the ability to type-check

Provide low-throughput alternative for dispatcher cron

Background

By default, the dispatcher is invoked every minute. While this interval is configurable, the dispatcher cron just doesn't make much sense for deployments with low throughput. You either pay for lots of wasted dispatcher invocations or you wait a long time before a binary is processed.

Options

You could let the S3 event notification invoke the analyzer directly (eliding the queue entirely). The problem is that the queue is pretty much required for retroactive analysis.

Instead, one suggestion was to change the CloudWatch event to be triggered from an SQS metric alarm instead of a cronjob. For example, when SQS:ApproximateNumberOfMessagesVisible > 0, you could invoke the dispatcher. This might introduce a several minute delay, but would be a reasonable compromise between cost and time-to-analysis.

Discussion welcome!

Add live test option to the CLI

In order to verify that BinaryAlert is working; add a CLI command to upload an EICAR test file to trigger a live BinaryAlert deployment

E.g. python3 manage.py live_test would upload a test file and trigger a real alert

Upgrade yara-python to 3.7.0

Background

The newest version of YARA, v3.7.0, was released earlier today.

The new version includes some minor bugfixes and a new integrity check for compiled rules.

Desired Change

Upgrade requirements as well as the pre-compiled yara-python.zip to v3.7.0 of YARA.

Additional 'name_prefix' validation

#40 verifies that the user-specified name_prefix is non-empty, but #41 shows that additional validation is necessary (e.g. . is not a valid character for some resource names).

We should add additional validation to the name prefix in the CLI (likely only alphanumeric + underscore characters allowed)

manage.py run issue

Background

running manage.py configue and getting the following error:

UK-C02T55NPFVH4:binaryalert sgooch$ python ./manage.py configure
File "./manage.py", line 60
def _get_input(prompt: str, default_value: str) -> str:

Add --version flag

When using the ./manage.py script, it would be helpful to have a --version flag to keep track of which BinaryAlert version a user is currently running.

Stop a batch analysis / prevent duplicate batches

Currently, a batch operation (manage.py analyze_all) is run after every deploy. This leads to a few problems:

  1. If the deploy was bad (e.g. unsupported YARA rules break the analyzers), there is no way to stop the batch analysis which has already started
  2. Two deploys in rapid succession will start two separate batch operations.

The solution to this will most likely be adding an entry to the DynamoDB table indicating whether or not a batch analysis is currently in progress. Care will be needed to consider edge cases, however (e.g. what happens if the batcher fails and this flag is never cleared)?

Use LAMBDA_TASK_ROOT when applicable

Background

BinaryAlert's Python source files use __file__ to find files relative to their own location, but Lambda provides the LAMBDA_TASK_ROOT environment variable which contains the location of the source code running in Lambda.

Desired Change

Use LAMBDA_TASK_ROOT instead of __file__ when applicable. This will make the code easier to read

manage terraform destroy option

Background

There is no option in manage.py to call the terraform destroy .
If performed manually via terraform directly it is not successful: it is unable to delete the S3, SNS and SQS resources.

Error applying plan:

5 error(s) occurred:

  • aws_s3_bucket.binaryalert_log_bucket (destroy): 1 error(s) occurred:

  • aws_s3_bucket.binaryalert_log_bucket: Error deleting S3 Bucket: BucketNotEmpty: The bucket you tried to delete is not empty. You must delete all versions in the bucket.
    status code: 409, request id: xxxxx, host id: yyyyyyyyyyy"logstotal.binaryalert-binaries.eu-west-2.access-logs"

  • aws_s3_bucket.binaryalert_binaries (destroy): 1 error(s) occurred:

  • aws_s3_bucket.binaryalert_binaries: Error deleting S3 Bucket: BucketNotEmpty: The bucket you tried to delete is not empty. You must delete all versions in the bucket.
    status code: 409, request id: 17B2137ACF4CE8E5, host id: yyyyyyyyyyyyyyy"logstotal.binaryalert-binaries.eu-west-2"

  • local.sns_publications: local.sns_publications: Resource 'aws_sns_topic.yara_match_alerts' does not have attribute 'name' for variable 'aws_sns_topic.yara_match_alerts.name'

  • local.sqs_age: local.sqs_age: Resource 'aws_sqs_queue.s3_object_queue' does not have attribute 'message_retention_seconds' for variable 'aws_sqs_queue.s3_object_queue.message_retention_seconds'

  • local.sqs: local.sqs: Resource 'aws_sqs_queue.s3_object_queue' does not have attribute 'name' for variable 'aws_sqs_queue.s3_object_queue.name'

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

Desired Change

Introduce destroy option.

Flatten SNS alert for better PagerDuty formatting

A current YARA match alert is of the form:

{
    'FileInfo': { ... },
    'MatchedRules': [
        {
            'RuleFile': 'rules.yara',
            'RuleName': 'my_rule_name,
            ...
        }
    ]
}

Unfortunately, PagerDuty does not nicely format the list of MatchedRules. To make the alert easier to read, the format should be changed to:

{
    'FileInfo': { ... },
    'NumMatchedRules': 2,
    'MatchedRule1': {
        'RuleFile': 'rules.yara',
        'RuleName': 'my_rule_name,
        ...
    }
    'MatchedRule2': {}
}

Information about each matched rule is then a top-level key, which should format much nicer.

Error analyzing PDFs: `pdftotext` not found [JSONDecodeError]

Background

yextend can actually parse PDFs to scan individual components, which is awesome! Unfortunately, this relies on pdftotext, a program not available in Lambda. So when BinaryAlert scans a PDF, yextend returns an empty string and the result is a JSONDecodeError

Desired Change

  1. Add error handling around yextend - if it fails for any reason, we should still continue with the regular analysis
  2. Bundle pdftotext in the Lambda dependencies (this may not happen in v1.1)
  3. Problems like this will be mitigated in the future once yextend supports portable installation

Analyzer should pull bucket from event, not environment variables

Background

Right now, the analyzer uses os.environ['S3_BUCKET_NAME'] to determine which bucket to download binaries from. The problem with this approach is that the analyzer can never be invoked for any other bucket. If the bucket name were pulled from the event notification instead of an environment variable, the analyzer could be invoked directly for any S3 object.

In particular, this would make it easy for users to manually add event notifications for their existing S3 buckets.

Desired Change

Use the bucket name from the event notification instead of an environment variable.

question

Hello, is there any way to have running this service on On-premises instead of AWS.
because i don't have it. my goal is to capture binary file and i do analysis on other platform.
any idea?

Force new versions of every Lambda on every deploy

The reason for this is simple: it will ensure all configuration updates are applied. Right now, if you change a configuration option (e.g. memory limit) in terraform.tfvars and deploy, the Lambda environment variables may be updated but they won't be activated until a new Lambda version is published.

So if every deploy publishes new versions of every Lambda function, all configuration updates are guaranteed to go through. This also has a nice atomicity property: a deploy creates a new snapshot of every Lambda function, not just a subset of them

Support UPX unpacking

Packed binaries evade simple string-based detection. Detect if files are packed and unpack them before analyzing with YARA.

New binaries are not prioritized during batch analysis

Background

When a batch analysis is running (which can take many hours depending on the size of the bucket), the same queue is used for both the batch analysis and new incoming binaries.

Desired Change

Ideally, newly added binaries should be somehow prioritized for analysis first

Update remote rules nightly

What would it take to run a nightly job to update the remote rule sets?
Adding sources to the remote rules makes the unit_test fail.

/opt/binaryalert/rules/clone_rules.py

REMOTE_RULE_SOURCES = {
    'https://github.com/Neo23x0/signature-base.git': ['yara'],
    'https://github.com/YARA-Rules/rules.git': ['CVE_Rules'],
    'https://github.com/SupportIntelligence/Icewater.git': ['']
}

$ ./manage.py unit_test
.........................................................................F
======================================================================
FAIL: test_update_rules (tests.rules.update_rules_test.UpdateRulesTest)
Verify which rules files were saved and deleted.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib64/python3.6/unittest/mock.py", line 1179, in patched
    return func(*args, **keywargs)
  File "/opt/binaryalert/tests/rules/update_rules_test.py", line 52, in test_update_rules
    self.assertEqual(expected_files, set(compile_rules._find_yara_files()))
AssertionError: Items in the second set but not the first:
'github.com/SupportIntelligence/Icewater.git/CVE_Rules/cloned.yara'

----------------------------------------------------------------------
Ran 74 tests in 18.957s

FAILED (failures=1)
TEST FAILED: Unit tests failed

Need a way to make sure all of the python libraries are available for the rules.

/opt/binaryalert/rules/clone_rules.py

REMOTE_RULE_SOURCES = {
    'https://github.com/Neo23x0/signature-base.git': ['yara'],
    'https://github.com/YARA-Rules/rules.git': [''],
    'https://github.com/SupportIntelligence/Icewater.git': ['']
}
$ ./manage.py compile_rules
Traceback (most recent call last):
  File "./manage.py", line 495, in <module>
    main()
  File "./manage.py", line 491, in main
    manager.run(args.command)
  File "./manage.py", line 352, in run
    getattr(self, command)()  # Command validation already happened in the ArgumentParser.
  File "./manage.py", line 421, in compile_rules
    compile_rules.compile_rules(COMPILED_RULES_FILENAME)
  File "/opt/binaryalert/rules/compile_rules.py", line 36, in compile_rules
    externals={'extension': '', 'filename': '', 'filepath': '', 'filetype': ''})
yara.SyntaxError: ./Mobile_Malware/Android_FakeApps.yar(101): invalid field name "app_name"

Would it be better to remove the rules or install the missing python libraries?

/opt/binaryalert/rules/compile_rules.py

    for line in yara_filepaths:
        try:
            test = yara.compile(RULES_DIR+'/'+line)
        except:
            os.remove(RULES_DIR+'/'+line)

    yara_filepaths = {relative_path: os.path.join(RULES_DIR, relative_path)
                      for relative_path in _find_yara_files()}

Compile requires enough memory to complete. These rules required a t2.small to build.

$ ./manage.py compile_rules
Traceback (most recent call last):
  File "./manage.py", line 495, in <module>
    main()
  File "./manage.py", line 491, in main
    manager.run(args.command)
  File "./manage.py", line 352, in run
    getattr(self, command)()  # Command validation already happened in the ArgumentParser.
  File "./manage.py", line 421, in compile_rules
    compile_rules.compile_rules(COMPILED_RULES_FILENAME)
  File "/opt/binaryalert/rules/compile_rules.py", line 45, in compile_rules
    externals={'extension': '', 'filename': '', 'filepath': '', 'filetype': ''})
MemoryError

Only a certain number of rules can apply before receiving this error.

$ ./manage.py apply
Traceback (most recent call last):
  File "./manage.py", line 495, in <module>
    main()
  File "./manage.py", line 491, in main
    manager.run(args.command)
  File "./manage.py", line 352, in run
    getattr(self, command)()  # Command validation already happened in the ArgumentParser.
  File "./manage.py", line 382, in apply
    subprocess.check_call(['terraform', 'apply', '-auto-approve=false'])
  File "/usr/lib64/python3.6/subprocess.py", line 291, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['terraform', 'apply', '-auto-approve=false']' returned non-zero exit status 1

Updated YARA rules do not re-alert

All YARA matches are saved to DynamoDB, but alerts are only sent to SNS if the YARA rule name has not matched before on the given binary. There are two problems with this:

  1. Rules which are renamed or reorganized will re-trigger alerts
  2. Rules whose content changes (e.g. a different rule condition) will not re-trigger an alert

Instead of looking up based on the rule name, there should be a comparison against some kind of hash of the YARA rule contents

Additional metrics based on metric filters: memory usage, etc

CloudWatch metric filters allow you to create metrics based on pattern-matching log data. This would allow us to add at least two more useful metrics:

  • We can measure the memory usage of the functions (a metric which is not available by default but is logged at the end of every Lambda invocation)
  • We can also count occurrences of specific error messages, e.g. "Task timed out" or "yara internal error," etc

test error

Background

Running for the first time

getting 4 errors:

ERROR: test_match_eicar_string (tests.rules.eicar_rule_test.EicarRuleTest)
Should match the exact EICAR string.

Traceback (most recent call last):
File "/Users/sgooch/Documents/Stash/binaryalert/tests/rules/eicar_rule_test.py", line 16, in setUp
with open(EICAR_TXT_FILE, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/sgooch/Documents/Stash/binaryalert/tests/rules/../files/eicar.txt'

======================================================================
ERROR: test_match_eicar_with_trailing_spaces (tests.rules.eicar_rule_test.EicarRuleTest)
Trailing whitespace is allowed after the EICAR string.

Traceback (most recent call last):
File "/Users/sgooch/Documents/Stash/binaryalert/tests/rules/eicar_rule_test.py", line 16, in setUp
with open(EICAR_TXT_FILE, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/sgooch/Documents/Stash/binaryalert/tests/rules/../files/eicar.txt'

======================================================================
ERROR: test_no_match_if_eicar_is_not_beginning (tests.rules.eicar_rule_test.EicarRuleTest)
No match if EICAR string is not the beginning of the file.

Traceback (most recent call last):
File "/Users/sgooch/Documents/Stash/binaryalert/tests/rules/eicar_rule_test.py", line 16, in setUp
with open(EICAR_TXT_FILE, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/sgooch/Documents/Stash/binaryalert/tests/rules/../files/eicar.txt'

======================================================================
ERROR: test_no_match_if_eicar_is_not_end (tests.rules.eicar_rule_test.EicarRuleTest)
No match if non-whitespace comes after the EICAR string.

Traceback (most recent call last):
File "/Users/sgooch/Documents/Stash/binaryalert/tests/rules/eicar_rule_test.py", line 16, in setUp
with open(EICAR_TXT_FILE, 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/sgooch/Documents/Stash/binaryalert/tests/rules/../files/eicar.txt'


Ran 74 tests in 3.971s

FAILED (errors=4)
TEST FAILED: Unit tests failed

Space in S3 filename breaks analyzers

If there is a space in a filename uploaded to S3, the analyzer is unable to download it. See if this can be fixed; otherwise update documentation

Add CarbonBlack downloader function

Provide a downloader function which copies binaries from CarbonBlack into the BinaryAlert S3 bucket. This makes it easier for CarbonBlack customers to copy over both their existing and future CarbonBlack binaries.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.