Giter VIP home page Giter VIP logo

libafl-legacy's Issues

Order of implementation of features

So there is as little work needed to push changes in afl++ into libafl I think we should define the order in which the library features are to be implemented.

So using mostly the names from: https://github.com/AFLplusplus/Website/blob/master/content/aflpp_fuzzing_framework_proposal.md

this starts with queue and then mutator as these are the least like to change. the further down the more likely there will be changes in either afl++ or how we envision what features are useful on the way.

queue : Structure & Handling (Struct; Get, Add, Remove)
queue : Generic seed scheduler (-p stuff)
queue : Generic seed scheduler : custom
queue : Trim (for new new entries)
queue : Trim : custom
Mutator : structure
Mutator : determinstic
Mutator : havoc
Mutator : mopt
Mutator : custom
Executor (Forkserver, Fauxserver, Network Connector): Structure
Executor : Input Channel (A way to send a new testcase to the target (multiple can be stacked)) > andrea: why multiple? marc: maybe so this can be threaded? one controller creating the input and putting them in a qeue, and the executor passes them on one by one. but: IMHO this would not be good for performance, too much complexity, too much code)
Executor: Observation Channel (shared mem, data from a tcp connection)
Feedback : Structure
Feedback : Converter (Reduce input channel information to something useful)
Feedback : Analysis crash or new path, etc.

Not sure what this is:
Virtual Input (hold input buffer and associated metadata (e.g. structure))
Feedback
Feedback specific queue
Feedback specific seed scheduler
Feedback specific seed energy

feel free to change or discuss

How to multicore by dom

My concept:

  • The main thread parses commandline flags, loads all test cases from disk, spawns all threads ("engines"). All engines have unique seeds (like main seed + engine id).
  • The main thread makes sure there are more queue elements than threads, worst case it clones them.
  • Afterwards, the main thread just sleeps, shows some stats, sleeps, writes to sync dir/ disk, tears down the workers on ctrl+c
  • The queue(s) (linked lists for now) live in a shared map, each queue element includes a in_use flag
  • If an engine looks for the next queue element to fuzz, it walks the queue (or starts from the beginning), keeping an internal ptr, does a CAS on in_use, if it doesn't work, it increases it and walks to the next element, repeat.
  • After fuzzing, in_use is reset.
  • If a new queue entry needs to be added, a global lock for this queue is acquired (spinlock), then the new element is added.

Downsides:

  1. All threads work on the same shared map
  2. inserts can be slow (however that will only be a problem in the first minutes)

Cleanup Makefiles

The Makefiles right now are far from ideal...
It would be great to have them cleaned up a little.

AFL-Style Testcase Support

Right now, in the example, each thread outputs lines to its own little file.
Instead, the broker or a background thread could write new queue entries to disk, whenever a new queue message flys by.
On top, we should add the option to resume a longer-running fuzzing campaign, and maybe even sync with AFL workers. This could also be done in the background thread.

Autogenerate Docs & more Documentation

Right now all doc is inside the code.
We should set up a sphinx instance or some other nice way to make the API browseable.
To get started with sphinx, there is this project (not super actively maintained) which may just be enough or this blog post that seems to be a bit more work, going through doxygen first.
I didn't find a way to get the sphinx thing going directly, without additional tools, but maybe there's an easy way.

Once we decided on a documentation format, we can start writing more docu.

In-Mem Crash Recovery

Right now, a Crashing input in the In-Memory Fuzzer will be lost.
This is rather unfortunate. As we use shared maps/llmp for fuzzing, the input is still around after a crash, and a new process, or even the broker, should be able to recover it.

Crash analysis - more info about crash in/from LibAFL

Wrote a harness to fuzz a lib using LibAFL.

I get a crash within LibAFL, but cannot reproduce it outside LibAFL (almost identical harness ... outside LibAFL I added just main() function)

Would be great if LibAFL could save stacktrace, registers along the crashfile etc during the observed crash to debug it.

Could be that it is an issue in my harness or LibAFL.

Having this info would help to debug issues with harnesses, also for other projects and other users.

Thanks,

P.S Cool project

How to multicore by marc

Goals:

  • no locks because its a bottleneck.
  • no or very little heap, because a target running havor could trash it
  • even if the threads all die the data is still there

There is one primary process which is the started process.
It forks of a single secondary, which is responsible for spawning the fuzzing childs.
It also creates a central shmem where all fuzzers register (a unique id, shmem IDs of queue, dictionary, ...)
it also occasionally (like every 30 minutes) persists the data to disk, and besides recovering if all threads die does (so far) nothing.

The secondary spawns off as many fuzzing threads as required and is itself also a fuzzer.

Every fuzzing thread first registers itself in the register. this needs a lock, once.
If a thread crashes it will not update a timing field in its main map anymore, the primary can do an alive check occasionally and tell the secondary to respawn, or forks of another secondary which does that.

Every fuzzer writes into his maps (e.g. info map, queue map, dict map, service request map, ...) and visits all other fuzzers occasionally (shmem IDs from the register) for stuff it is interested in. only read access to other fuzzer maps, no writing.
e.g. selecting a queue entry could be rand_below(number_of_fuzzers) -> rand_below(selected_fuzzer->queue->no_of_entries).
(needs more glue, as the selection should be based on a calibration and weighting the entries)

and that is already it. except for the central register no locks and no writing into foreign memory.
no performance problems.

and with the knowledge of the shmemID of the register you can even start different fuzzers that join in.

other idea I had: a fuzzer can perform services for other fuzzers.
e.g. if fuzzer A needs a path to be solved where it has no finds,and thenwrites into his service request map to please handle queue entry X for cmplog and laf.
A fuzzer B that has cmplog visits occasionally the service request map of others and if it sees something it can perform and has nothing better todo can then perform that and writes that it was completed into his own service request map.

Oracle class

A fuzzer can look for something different than crashes. Think about timeouts for instance.

The oracle class looks at observation channels, similarly to feedback, but it decides if the testcase is crashing, not if it is interesting.
For regular fuzzing, you should define the exit code of a process as an observation channel. To make the AFL oracle, you need two observation channels that are the exit code and the coverage map.

The dedup (you can choose also that dedup is part of oracle, not a separate entity) take a crashing input, look at observation channels and say of a crash worth saving (not a duplicate). in AFL this just uses the feedback to see if the crash triggers new coverage.

For timeout, the oracle will use the coverage and the timing observation channels. I hope it is clear.

C++ Bindings

We need a nice way to interface with LibAFL from C++.

When we should go public

Rishi did good work, but ofc libafl is a greater thing than GSoC and it will not be ready for the end of August.

Should we make it public to highlight the contribution of Rishi? IIRC this is not needed for GSoC, and if he will do a blogpost this should be fine.

We can publish it with a WIP disclaimer while slowing migrate pieces of AFL++ here. My main concern is that other people can "steal" good ideas from here before we finish because maintaining compatibility with AFL will take more time than rewriting a fuzzer from scratch.

What should we do?

Compile problem under Debian Buster 32-Bit

When compiling examples/libpng-1.6.37/contrib/libtests/pngvalid.c, I get
clang -DHAVE_CONFIG_H -I. -g -fPIC -Iinclude -Wall -Wextra -Werror -Wshadow -fstack-protector-strong -D_FORTIFY_SOURCE=2 -O3 -MT contrib/libtests/pngvalid.o -MD -MP -MF $depbase.Tpo -c -o contrib/libtests/pngvalid.o contrib/libtests/pngvalid.c &&
mv -f $depbase.Tpo $depbase.Po
In file included from contrib/libtests/pngvalid.c:26:
In file included from /usr/include/signal.h:25:
/usr/include/features.h:185:3: error: "_BSD_SOURCE and _SVID_SOURCE are deprecated, use _DEFAULT_SOURCE" [-Werror,-W#warnings]
# warning "_BSD_SOURCE and _SVID_SOURCE are deprecated, use _DEFAULT_SOURCE"
^
1 error generated.
make[3]: *** [Makefile:1196: contrib/libtests/pngvalid.o] Error 1

pngvalid.c line 24 has
#define _BSD_SOURCE 1 /* For the floating point exception extension */

The problem is in the Makefile which exports CFLAGS instead of setting CFLAGS as make argument.

Get some CI going

Right now, we don't have any CI for LibAFL.
However, we do have unittests and examples that can pose as test cases. These should be added to Travis, similar to AFL++.

AFL++ Custom Mutator Support

Right now LibAFL only supports its own, internal, mutators at build time. It may be beneficial to add a wrapper around the current mutators that can interface with existing Custom Mutators (at least those in C).

Status update

@rish9101 as I not one of your GSoC mentors, but your work is closely related to my specification for a fuzzing framewok, can you update me on the status of your project?

I guess, at this point, your reimplemented my FFF poc in C and added things from AFL++. Can I build a fuzzer with the current API?

Possible PoC Usecases

We can collect some use cases here that would be nice to make possible using libAFL.

I'll start with a "proper" untracer support
https://twitter.com/buherator/status/1269396963998543872?s=19
Taking away breakpoints as they are hit also means that subsequent runs are not reproducible.
Subsequent runs with the same input should be considered successful on "no new coverage". Instead of "same hash".

Add to Fuzzbench

This is a longer-reaching issue:
We should eventually add LibAFL to Fuzzbench, potentially even the in-mem fuzzer (crushing all other fuzzers :) ).

Directory format

We have entities (eg executor) and for each entity we will provide some implementations into libafl (eg inmemoryexecutor and forkserverexecutor). These implementations are part of the library, they should not be in the example folder.

I propose to create a subfolder in include and src for each entity and place into them a main file with the asbtract class and the other implementations.

For instance.

include/executor/executor.h contains only the abstract class
include/executor/in_memory_executor.h contains the implementation

same for src/

Btw just look at FFF

Build system

If libafl has to be multi platform, we cannot really use just GNU makefiles.

I propose meson as it was recently adopted by QEMU it and seems a sane build system, more than cmake for sure.

We can maintain both meson files and raw makefiles for linux/bsd if needed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.