The mishegos from trailofbits

mishegos's Issues

Switch to GitHub Actions

We should use GitHub Actions to test mishegos's build, instead of Travis.

Change output format to something more performant

One easy win for mishegos's performance would be to change the cohort output format from JSON (via parson) to something that requires less serialization work/fewer allocations.

Some options:

Just dump the cohort structs directly
- Pros: Easy, fast
- Cons: Annoying to consume, backwards incompatibility unless we length- and version-prefix the structs
CBOR/msgpack/some other JSON-like binary format
- Pros: Probably fast
- Cons: Need to vendor something potentially complex, potentially annoying APIs
netstrings/canonical S-exprs/something bespoke
- Pros: Tweakable, wouldn't require vendoring anything (or relatively little)
- Cons: Would probably need to write a parser as well

Additional analyses

The following analyses are needed:

size-discrepancies: Find all cohorts where all decoders succeed, but one or more disagree on the decoded instruction's size
destroy-xed: Try to find errors in XED by pressuring it against Zydis, bddisasm, and iced
single-status-discrepancy: Exactly one decoder disagrees with all of the others on the instruction's validity (treating everything that isn't S_SUCCESS as a failure)

Additional fuzzing targets

Some additional targets that could use a worker implementation:

Fix the Zydis worker

Zydis keeps changing their C API and I've been too lazy to update the worker to reflect it, so the current submodule is several months out of date.

Make it easier to control which workers are built

As we continue to add more workers (#5, #8), users will probably want to restrict their builds to just those workers/fuzzer targets that they care about.

They can do that right now by editing the WORKERS variable in src/worker/Makefile, but that's a little tedious. We should expose a variable or other mechanism in the top-level Makefile to limit which workers are built.

Replace Kaitai-generated parser with a handwritten one

Per #149: Kaitai is a bit of a mess to install reliably outside of the Docker container.

The cohorts format is simple enough; we should probably just rewrite mish2jsonl in C and re-use Mishegos's structure definitions in it.

Create a "fuzzed by mishegos" badge

We could offer this to projects that are actively using mishegos, as a badge of (dis?)honor.

Build and run mishegos on macOS

This should be do-able, especially after the changes in #1302. Just needs documentation and possibly some more small build system fixes.

Unit tests

It'd be nice to have some basic tests in the CI, probably using mishegos's manual mode to feed some simple inputs in.

mish2jsonl tool fails after build instructions on fresh Ubuntu 20.04 installation, but works on docker image

Problem: Whenever running the mish2jsonl it breaks while it is being run on my main OS shell (Ubuntu 20.04), however running it from a clean docker image it works fine.

The main host had no development tools installed, and ran a copy pasted version of the docker build as an install script without docker setup code.
All versions between python, python3, pip3 and ruby are identical. All python3 modules from the docker image are also present on the main host.

The same input file was used across docker and the host.
The input file was tested with both being in /tmp/ and in the root directory of the repository.

Anybody else that ran into this?

./src/mish2jsonl/mish2jsonl < /tmp/mishegos

Traceback (most recent call last):
  File "./src/mish2jsonl/mish2jsonl", line 49, in <module>
    main()
  File "./src/mish2jsonl/mish2jsonl", line 12, in main
    cohorts = Cohorts.from_io(sys.stdin.buffer)
  File "/home/$user/.local/lib/python3.8/site-packages/kaitaistruct.py", line 47, in from_io
    return cls(KaitaiStream(io))
  File "/home/$user/dev/mishegos/src/mish2jsonl/cohorts.py", line 26, in __init__
    self._read()
  File "/home/$user/dev/mishegos/src/mish2jsonl/cohorts.py", line 32, in _read
    self.cohorts.append(self._root.Cohort(self._io, self, self._root))
  File "/home/$user/dev/mishegos/src/mish2jsonl/cohorts.py", line 41, in __init__
    self._read()
  File "/home/$user/dev/mishegos/src/mish2jsonl/cohorts.py", line 49, in _read
    self.outputs[i] = self._root.Output(self._io, self, self._root)
  File "/home/$user/dev/mishegos/src/mish2jsonl/cohorts.py", line 58, in __init__
    self._read()
  File "/home/$user/dev/mishegos/src/mish2jsonl/cohorts.py", line 61, in _read
    self.status = KaitaiStream.resolve_enum(self._root.DecodeStatus, self._io.read_u4le())
AttributeError: type object 'KaitaiStream' has no attribute 'resolve_enum'

Compare against real silicon behaviour

Something that we had planned to look into sooner or later is diffing against the behaviour observed from actual CPUs, in a sandsifter-like fashion. Would that be beyond the scope of this project or would you be interested in adding something like that as well?

Regression pipeline

Now that we have CI configured and automated submodule updates via Dependabot, it'd be interesting to hack together some kind of basic regression testing for our implemented workers. Something like this:

We run the mishegos for a few hours to generate a decently sized corpus
We run that corpus through our analysis passes to filter out true positives and negatives
We save the filtered corpus somewhere (cloud storage?)
On each Dependabot PR, we perform additional CI jobs to see if any of our results have changed

Allow user to suppress result cohorts for uniform negatives

Right now, mishegos spits out every single cohort for every single candidate tried, even if every decoder agrees that the input is invalid. This results in extremely large outputs by default, since even guided searches of the x86 encoding space produce large numbers of garbage instruction candidates.

We should preserve the default behavior, but introduce a new flag/environment variable that tells mishegos to filter out uniformly negative results.

Disable the udis86 worker/decoder

udis86 appears to have not received any changes since 2014 and is currently introducing maintenance burden on mishegos's build environment (because it needs Python 2), so we should just go ahead and disable it for now.

Document the process of adding a worker

I need to do this at some point, but the 30 second version:

Add a new directory under ./src/worker/
Submodule the worker's dependencies under the new directory, if necessary
Update WORKERS in ./src/worker/Makefile to include the new worker
Update the find expression in ALL_SRCS in the top-level Makefile to exclude the submodule(s)
Update workers.spec to list the new worker's shared object

trailofbits / mishegos Goto Github PK

mishegos's People

Stargazers

Watchers

Forkers

mishegos's Issues

Recommend Projects

Recommend Topics

Recommend Org