Giter VIP home page Giter VIP logo

mishegos's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mishegos's Issues

Change output format to something more performant

One easy win for mishegos's performance would be to change the cohort output format from JSON (via parson) to something that requires less serialization work/fewer allocations.

Some options:

  • Just dump the cohort structs directly
    • Pros: Easy, fast
    • Cons: Annoying to consume, backwards incompatibility unless we length- and version-prefix the structs
  • CBOR/msgpack/some other JSON-like binary format
    • Pros: Probably fast
    • Cons: Need to vendor something potentially complex, potentially annoying APIs
  • netstrings/canonical S-exprs/something bespoke
    • Pros: Tweakable, wouldn't require vendoring anything (or relatively little)
    • Cons: Would probably need to write a parser as well

Additional analyses

The following analyses are needed:

  • size-discrepancies: Find all cohorts where all decoders succeed, but one or more disagree on the decoded instruction's size
  • destroy-xed: Try to find errors in XED by pressuring it against Zydis, bddisasm, and iced
  • single-status-discrepancy: Exactly one decoder disagrees with all of the others on the instruction's validity (treating everything that isn't S_SUCCESS as a failure)

Fix the Zydis worker

Zydis keeps changing their C API and I've been too lazy to update the worker to reflect it, so the current submodule is several months out of date.

Make it easier to control which workers are built

As we continue to add more workers (#5, #8), users will probably want to restrict their builds to just those workers/fuzzer targets that they care about.

They can do that right now by editing the WORKERS variable in src/worker/Makefile, but that's a little tedious. We should expose a variable or other mechanism in the top-level Makefile to limit which workers are built.

Unit tests

It'd be nice to have some basic tests in the CI, probably using mishegos's manual mode to feed some simple inputs in.

mish2jsonl tool fails after build instructions on fresh Ubuntu 20.04 installation, but works on docker image

Problem: Whenever running the mish2jsonl it breaks while it is being run on my main OS shell (Ubuntu 20.04), however running it from a clean docker image it works fine.

The main host had no development tools installed, and ran a copy pasted version of the docker build as an install script without docker setup code.
All versions between python, python3, pip3 and ruby are identical. All python3 modules from the docker image are also present on the main host.

The same input file was used across docker and the host.
The input file was tested with both being in /tmp/ and in the root directory of the repository.

Anybody else that ran into this?

./src/mish2jsonl/mish2jsonl < /tmp/mishegos

Traceback (most recent call last):
  File "./src/mish2jsonl/mish2jsonl", line 49, in <module>
    main()
  File "./src/mish2jsonl/mish2jsonl", line 12, in main
    cohorts = Cohorts.from_io(sys.stdin.buffer)
  File "/home/$user/.local/lib/python3.8/site-packages/kaitaistruct.py", line 47, in from_io
    return cls(KaitaiStream(io))
  File "/home/$user/dev/mishegos/src/mish2jsonl/cohorts.py", line 26, in __init__
    self._read()
  File "/home/$user/dev/mishegos/src/mish2jsonl/cohorts.py", line 32, in _read
    self.cohorts.append(self._root.Cohort(self._io, self, self._root))
  File "/home/$user/dev/mishegos/src/mish2jsonl/cohorts.py", line 41, in __init__
    self._read()
  File "/home/$user/dev/mishegos/src/mish2jsonl/cohorts.py", line 49, in _read
    self.outputs[i] = self._root.Output(self._io, self, self._root)
  File "/home/$user/dev/mishegos/src/mish2jsonl/cohorts.py", line 58, in __init__
    self._read()
  File "/home/$user/dev/mishegos/src/mish2jsonl/cohorts.py", line 61, in _read
    self.status = KaitaiStream.resolve_enum(self._root.DecodeStatus, self._io.read_u4le())
AttributeError: type object 'KaitaiStream' has no attribute 'resolve_enum'

Compare against real silicon behaviour

Something that we had planned to look into sooner or later is diffing against the behaviour observed from actual CPUs, in a sandsifter-like fashion. Would that be beyond the scope of this project or would you be interested in adding something like that as well?

Regression pipeline

Now that we have CI configured and automated submodule updates via Dependabot, it'd be interesting to hack together some kind of basic regression testing for our implemented workers. Something like this:

  1. We run the mishegos for a few hours to generate a decently sized corpus
  2. We run that corpus through our analysis passes to filter out true positives and negatives
  3. We save the filtered corpus somewhere (cloud storage?)
  4. On each Dependabot PR, we perform additional CI jobs to see if any of our results have changed

Allow user to suppress result cohorts for uniform negatives

Right now, mishegos spits out every single cohort for every single candidate tried, even if every decoder agrees that the input is invalid. This results in extremely large outputs by default, since even guided searches of the x86 encoding space produce large numbers of garbage instruction candidates.

We should preserve the default behavior, but introduce a new flag/environment variable that tells mishegos to filter out uniformly negative results.

Disable the udis86 worker/decoder

udis86 appears to have not received any changes since 2014 and is currently introducing maintenance burden on mishegos's build environment (because it needs Python 2), so we should just go ahead and disable it for now.

Document the process of adding a worker

I need to do this at some point, but the 30 second version:

  1. Add a new directory under ./src/worker/
  2. Submodule the worker's dependencies under the new directory, if necessary
  3. Update WORKERS in ./src/worker/Makefile to include the new worker
  4. Update the find expression in ALL_SRCS in the top-level Makefile to exclude the submodule(s)
  5. Update workers.spec to list the new worker's shared object

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.