trailofbits / mishegos Goto Github PK
View Code? Open in Web Editor NEWA differential fuzzer for x86 decoders
License: Apache License 2.0
A differential fuzzer for x86 decoders
License: Apache License 2.0
We should use GitHub Actions to test mishegos's build, instead of Travis.
One easy win for mishegos's performance would be to change the cohort output format from JSON (via parson) to something that requires less serialization work/fewer allocations.
Some options:
The following analyses are needed:
size-discrepancies
: Find all cohorts where all decoders succeed, but one or more disagree on the decoded instruction's sizedestroy-xed
: Try to find errors in XED by pressuring it against Zydis, bddisasm, and icedsingle-status-discrepancy
: Exactly one decoder disagrees with all of the others on the instruction's validity (treating everything that isn't S_SUCCESS
as a failure)Some additional targets that could use a worker implementation:
Zydis keeps changing their C API and I've been too lazy to update the worker to reflect it, so the current submodule is several months out of date.
As we continue to add more workers (#5, #8), users will probably want to restrict their builds to just those workers/fuzzer targets that they care about.
They can do that right now by editing the WORKERS
variable in src/worker/Makefile
, but that's a little tedious. We should expose a variable or other mechanism in the top-level Makefile
to limit which workers are built.
Per #149: Kaitai is a bit of a mess to install reliably outside of the Docker container.
The cohorts format is simple enough; we should probably just rewrite mish2jsonl
in C and re-use Mishegos's structure definitions in it.
We could offer this to projects that are actively using mishegos, as a badge of (dis?)honor.
This should be do-able, especially after the changes in #1302. Just needs documentation and possibly some more small build system fixes.
It'd be nice to have some basic tests in the CI, probably using mishegos
's manual mode to feed some simple inputs in.
Problem: Whenever running the mish2jsonl it breaks while it is being run on my main OS shell (Ubuntu 20.04), however running it from a clean docker image it works fine.
The main host had no development tools installed, and ran a copy pasted version of the docker build as an install script without docker setup code.
All versions between python, python3, pip3 and ruby are identical. All python3 modules from the docker image are also present on the main host.
The same input file was used across docker and the host.
The input file was tested with both being in /tmp/ and in the root directory of the repository.
Anybody else that ran into this?
./src/mish2jsonl/mish2jsonl < /tmp/mishegos
Traceback (most recent call last):
File "./src/mish2jsonl/mish2jsonl", line 49, in <module>
main()
File "./src/mish2jsonl/mish2jsonl", line 12, in main
cohorts = Cohorts.from_io(sys.stdin.buffer)
File "/home/$user/.local/lib/python3.8/site-packages/kaitaistruct.py", line 47, in from_io
return cls(KaitaiStream(io))
File "/home/$user/dev/mishegos/src/mish2jsonl/cohorts.py", line 26, in __init__
self._read()
File "/home/$user/dev/mishegos/src/mish2jsonl/cohorts.py", line 32, in _read
self.cohorts.append(self._root.Cohort(self._io, self, self._root))
File "/home/$user/dev/mishegos/src/mish2jsonl/cohorts.py", line 41, in __init__
self._read()
File "/home/$user/dev/mishegos/src/mish2jsonl/cohorts.py", line 49, in _read
self.outputs[i] = self._root.Output(self._io, self, self._root)
File "/home/$user/dev/mishegos/src/mish2jsonl/cohorts.py", line 58, in __init__
self._read()
File "/home/$user/dev/mishegos/src/mish2jsonl/cohorts.py", line 61, in _read
self.status = KaitaiStream.resolve_enum(self._root.DecodeStatus, self._io.read_u4le())
AttributeError: type object 'KaitaiStream' has no attribute 'resolve_enum'
Something that we had planned to look into sooner or later is diffing against the behaviour observed from actual CPUs, in a sandsifter-like fashion. Would that be beyond the scope of this project or would you be interested in adding something like that as well?
Now that we have CI configured and automated submodule updates via Dependabot, it'd be interesting to hack together some kind of basic regression testing for our implemented workers. Something like this:
Right now, mishegos
spits out every single cohort for every single candidate tried, even if every decoder agrees that the input is invalid. This results in extremely large outputs by default, since even guided searches of the x86 encoding space produce large numbers of garbage instruction candidates.
We should preserve the default behavior, but introduce a new flag/environment variable that tells mishegos
to filter out uniformly negative results.
udis86
appears to have not received any changes since 2014 and is currently introducing maintenance burden on mishegos's build environment (because it needs Python 2), so we should just go ahead and disable it for now.
I need to do this at some point, but the 30 second version:
./src/worker/
WORKERS
in ./src/worker/Makefile
to include the new workerfind
expression in ALL_SRCS
in the top-level Makefile
to exclude the submodule(s)workers.spec
to list the new worker's shared objectA declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.