Giter VIP home page Giter VIP logo

machina's People

Contributors

ehrenb avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

machina's Issues

Time-based triggers

Currently, modules trigger on incoming data types. Modules like Similarity might be best triggered at a frequency, rather than per input.

  • PeriodicWorker base class
  • SimilarityAnalysis conversion
    • fix duplication of rels
    • Note on bi-directional relationships with Neo: neo4j-contrib/neomodel#345 . As a sanity check, can run the following query to prove that 2 relationships do exist: "MATCH (a)-[r:SIMILAR]-(b) RETURN r"
  • Document
    • Dev documentation
    • API documentation

ghidra-base

Create a new base image to enable some Ghidra analyses.

  • base config for GhidraWorker separate from Worker.json
  • environment var for -max-cpu command line option
  • patch analyzeHeadless to support GHIDRA_MAXMEM environment variable for MAXMEM
  • example ghidra-project-creator subclass

refactor build

Currently, the docker build process performs a 'git clone' to rebuild images with updated code. This prevents any of the useful docker layering from happening (since Docker can't detect a change).

ssdeep worker

  • SSDeep analysis worker (works on '*')
  • SSDeep similarity edge (threshold?)

auto-render schemas as HTML in docs

Can't do multi-stage builds in the docs Dockerfile, because there is a circular dependency problem.

...
# multi-stage build to copy in worker source modules
# for autodoc'ing their source and schemas
# TODO: resolve how to mock imports for each, as 
# we dont want to have to install all 3rd party deps
# for all workers. see https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#confval-autodoc_mock_imports 

# FROM behren/machina-androdguard:latest as androguard
# FROM behren/machina-binwalk:latest as binwalk_img
# RUN mkdir /machina/binwalk && touch /machina/binwalk/__init__.py
# COPY --from=binwalk_img /machina/src /machina/binwalk

# FROM behren/machina-bz2:latest as bz2
# FROM behren/machina-exif:latest as exif
# FROM behren/machina-findurls:latest as findurls
# FROM behren/machina-gzip:latest as gzip

# FROM behren/machina-identifier:latest as identifier
# RUN mkdir /machina/identifier && touch /machina/identifier/__init__.py
# COPY --from=identifier /machina/src /machina/identifier

# FROM behren/machina-jar:latest as jar
# FROM behren/machina-similarity:latest as similarity
# FROM behren/machina-ssdeep:latest as ssdeep
# FROM behren/machina-tar:latest as tar
# FROM behren/machina-zip:latest as zip
# FROM behren/machina-ghidra-project-creator:latest as ghidra-project-creator

...

Also, to import these for autodc, we can use mock-import to suppress the import warnings instead of bloating the image with all dependencies:

conf.py

autodoc_mock_imports = [
    'python-magic'
]

For now, just keeping referential documentation in workers.csv in the docs repo.

ELK stack

Add an ELK stack to monitor all container logs in the namespace.

PeriodicWorker rocketry error

The below occurs upon startup of the system. This is an issue with pydantic2 and rocketry, see Miksus/rocketry#225 and Miksus/rocketry#210 . For now, I manually downgraded pydantic to 1.10.13.

machina-similarityanalysis-1      | Traceback (most recent call last):
machina-similarityanalysis-1      |   File "/machina/src/run.py", line 3, in <module>
machina-similarityanalysis-1      |     from similarityanalysis import SimilarityAnalysis
machina-similarityanalysis-1      |   File "/machina/src/similarityanalysis.py", line 6, in <module>
machina-similarityanalysis-1      |     from machina.core.periodic_worker import PeriodicWorker
machina-similarityanalysis-1      |   File "/usr/local/lib/python3.10/dist-packages/machina-0.1-py3.10.egg/machina/core/periodic_worker.py", line 8, in <module>
machina-similarityanalysis-1      |     from rocketry import Rocketry
machina-similarityanalysis-1      |   File "/usr/local/lib/python3.10/dist-packages/rocketry/__init__.py", line 1, in <module>
machina-similarityanalysis-1      |     from .session import Session
machina-similarityanalysis-1      |   File "/usr/local/lib/python3.10/dist-packages/rocketry/session.py", line 18, in <module>
machina-similarityanalysis-1 exited with code 1
machina-similarityanalysis-1      |     from rocketry.log.defaults import create_default_handler
machina-similarityanalysis-1      |   File "/usr/local/lib/python3.10/dist-packages/rocketry/log/defaults.py", line 1, in <module>
machina-similarityanalysis-1      |     from redbird.logging import RepoHandler
machina-similarityanalysis-1      |   File "/usr/local/lib/python3.10/dist-packages/redbird/__init__.py", line 2, in <module>
machina-similarityanalysis-1      |     from .base import BaseRepo, BaseResult
machina-similarityanalysis-1      |   File "/usr/local/lib/python3.10/dist-packages/redbird/base.py", line 116, in <module>
machina-similarityanalysis-1      |     class BaseRepo(ABC, BaseModel):
machina-similarityanalysis-1      |   File "/usr/local/lib/python3.10/dist-packages/redbird/base.py", line 153, in BaseRepo
machina-similarityanalysis-1      |     ordered: bool = Field(default=False, const=True)
machina-similarityanalysis-1      |   File "/usr/local/lib/python3.10/dist-packages/pydantic/fields.py", line 764, in Field
machina-similarityanalysis-1      |     raise PydanticUserError('`const` is removed, use `Literal` instead', code='removed-kwargs')
machina-similarityanalysis-1      | pydantic.errors.PydanticUserError: `const` is removed, use `Literal` instead

Improve flexibility for type compatibility spec

Support for whitelist/blacklist options

  • Demonstrate with findurls. Findurls shouldnt bind to "URL", it should bind to everything else though.
  • document removal of '*', blacklist/whitelist, artifact

orientdb client version errors

OrientDB's Python client has been consistently broken for the past couple years. A newer fork (orientechnologies/pyorient#42, https://github.com/brucetony/pyorient) claims to support 3.1.x, but using the 3.1.12 Docker image results in the below error:

Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/machina/images/identifier/src/identifier.py", line 121, in callback
    type=resolved_type)
  File "/src/pyorient/pyorient/ogm/broker.py", line 56, in create
    return self.g.create_vertex(self.element_cls, **kwargs)
  File "/src/pyorient/pyorient/ogm/graph.py", line 532, in create_vertex
    result = self.client.command(self.create_vertex_command(vertex_cls, **kwargs))[0]
  File "/src/pyorient/pyorient/orient.py", line 481, in command
    return self.get_message("CommandMessage").prepare((QUERY_CMD,) + args).send().fetch_response()
  File "/src/pyorient/pyorient/utils.py", line 48, in wrap_function
    return wrap(*args, **kwargs)
  File "/src/pyorient/pyorient/utils.py", line 61, in wrap_function
    return wrap(*args, **kwargs)
  File "/src/pyorient/pyorient/messages/commands.py", line 128, in prepare
    self._encode_field(x) for x in _payload_definition
  File "/src/pyorient/pyorient/messages/commands.py", line 128, in <genexpr>
    self._encode_field(x) for x in _payload_definition
  File "/src/pyorient/pyorient/messages/database.py", line 379, in _encode_field
    _content = struct.pack("!i", len(v)) + v
TypeError: object of type 'VertexCommand' has no len()

Until official (or stable) maintenance of the pyorient project happens, Machina will have to continue to depend on an older version of OrientDB and another pyorient fork.

Working client: https://github.com/alanmeeson/pyorient.git@0317a87369675df9b33fd38af451099c3c011d40#egg=pyorient
Working server: 2.2

prevent redundant OGM creation attempts

Currently, each worker attempts to initialize the OGM. While no duplicate OGMs will be created, it does cause significant delay in start time. There should be one dedicated service to init the OGMs, or a check to see if the OGM exists before init.

OrientDB shows the following error:

2022-12-25 16:30:39:479 SEVER Exception `5EA22C9A` in storage `plocal:/orientdb/databases/machina`: 3.2.13 (build 1b0940491143c734d9f7338b321c2cde319a79ef, branch UNKNOWN) [OLocalPaginatedStorage]
com.orientechnologies.orient.core.exception.OCommandExecutionException: Property 'apk.md5' already exists. Remove it before to retry.
	DB name="machina"
	at com.orientechnologies.orient.core.sql.OCommandExecutorSQLCreateProperty.execute(OCommandExecutorSQLCreateProperty.java:298)
	at com.orientechnologies.orient.core.sql.OCommandExecutorSQLDelegate.execute(OCommandExecutorSQLDelegate.java:74)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.executeCommand(OAbstractPaginatedStorage.java:4205)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.command(OAbstractPaginatedStorage.java:4171)
	at com.orientechnologies.orient.core.command.OCommandRequestTextAbstract.execute(OCommandRequestTextAbstract.java:63)
	at com.orientechnologies.orient.server.OConnectionBinaryExecutor.executeCommand(OConnectionBinaryExecutor.java:618)

streamline release with gh release

"gh release create" creates tags automatically. do away with the git tag + manual release script, replace with the "gh release" command.

split analysis modules into different repositories

  • move analysis modules, 'images' to new repositories
  • create git submoulde for each
  • replace 'master' branch with 'main'
  • document foreach pull, push, and submodule add
  • move worker schemas to new module repos
  • retest build (docker compose build)
  • retest prod stack (docker compose pull, or just docker compose up)

sphinx rtd docs

  • Python Typing, doctrings
    • base class
    • models
  • architecture diagram
  • migrate MD docs to RST
  • Sphinx RDT w/ Dockerfile and web service
  • move Dockerfile.docs and docs/ to a separate repository. As it stands, since the docs image depends on base-alpine i wont be able to trigger a downstream build because both code bases are in the same repository

ClamAV worker

Create a ClamAV worker module that fires on (all?) types:

In Dockerfile:

At runtime

  • run 'clamdandfreshclam -d` to background clamd and db updates

  • communicate with clamd via Python over the socket interface described here: https://manpages.debian.org/unstable/clamav-daemon/clamd.8.en.html

    • clamscan is slow, switch to clamav-daemon (clamd) in background so the service is always ready to respond
    • volume for clam_db to speed up start times when freshclam runs?
    • validate using known malicious sample
  • add to csv in docs

PyOrientSecurityAccessException

After some time (20 minutes) of analyzing a jffs2, workers start throwing the following error:

Exception in thread Thread-3820 (callback):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/machina/src/identifier.py", line 137, in callback
    origin_node = self.graph.get_vertex(data['origin']['id'])
  File "/usr/lib/python3.10/site-packages/pyorient/ogm/graph.py", line 612, in get_vertex
    record = self.client.command('SELECT FROM {}'.format(vertex_id))
  File "/usr/lib/python3.10/site-packages/pyorient/orient.py", line 481, in command
    return self.get_message("CommandMessage").prepare((QUERY_CMD,) + args).send().fetch_response()
  File "/usr/lib/python3.10/site-packages/pyorient/messages/commands.py", line 143, in fetch_response
    super(CommandMessage, self).fetch_response()
  File "/usr/lib/python3.10/site-packages/pyorient/messages/database.py", line 300, in fetch_response
    self._decode_all()
  File "/usr/lib/python3.10/site-packages/pyorient/messages/database.py", line 283, in _decode_all
    self._decode_header()
  File "/usr/lib/python3.10/site-packages/pyorient/messages/database.py", line 229, in _decode_header
    raise PyOrientCommandException(
pyorient.exceptions.PyOrientSecurityAccessException: com.orientechnologies.orient.core.exception.OSecurityAccessException - Invalid authentication info for access to the database com.orientechnologies.orient.core.metadata.security.auth.OTokenAuthInfo@18287ceb
	DB name="machina"

bzip2

bzip2 decompression

configurable logging per module

Base config should be INFO. Override individual modules in their respective config file.

  • Create Worker.json worker config to inherit from
  • Apply Worker.json in Worker Init()
    • Apply Worker config, potentially overriding Base Worker
  • Init logger with setLevel()

cpio

cpio unarchiving

extended type resolution using patterns

For findurls, there is no mimetype or detailed data associated with a URL, so the worker has to type it as 'url' manually when resubmitting. However, it should support blind resubmission (for CLI submission), but there is no mime data associated with a URL. To support this, there should be a new resolution method not based on mime/detailed type, but regex or pattern for data.

lzma

lzma decompression

prevent duplicate artifacts

  • prevent duplication of things being retyped
  • prevent duplication of extracted things from different modules, e.g. files carved vs unzipped

bump build-push-actions in module Actions

The build actions are giving the following warning about an upcoming deprecation:

Warning: The `save-state` command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/

the build-push-action version should be bumped:

docker/build-push-action#779 (comment)

URLAnalysis worker

Now that URL objects are created by the Identifier (instead of findurls), there needs to be another work that analyzes URLS. This worker should:

  • Update URL obj with the URL string
  • Parse parameters from the URL (if exists)
  • Parse protocol, path, domain, webpage from the URL string
  • Configurable - visit the web page and copy the response body

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.