Giter VIP home page Giter VIP logo

cobrafuzz's People

Contributors

gitlabmike avatar jvoisin avatar ksamuel avatar mj-seo avatar neuromancer avatar qpalzmm22 avatar senier avatar thiago-gitlab avatar

Stargazers

 avatar  avatar  avatar  avatar

cobrafuzz's Issues

Parallel fuzzing

Utilize multiple CPU cores for fuzzing.

Parallel fuzzing requires refactoring of coverage collection. Right now, coverage is collected in the child process and the new coverage is sent over a pipe shared between parent and child. The parent then compares the new coverage with the coverage previously stored (by the parent) and in case it increased, stores the binary returned by the child process.

The current approach has a number of limitations:

  • Coverage is recorded in the child process only. In case of multiple child processes, a sample may be considered to lead to new paths, while it just lead to a new path in a specific child process.
  • Binaries are needlessly sent back from child to parent, while the parent needs to keep the sample anyways to handle cases where the child process hangs or times out
  • The information returned is either a number (the current coverage) or the text of an exception. Ideally, the format should be more structured (e.g. a JSON document).

Design:

Child processes

The tracer is changed such that it can be reset.

  • Read the next job from the job queue
  • Reset the tracer
  • Put a status message into the result queue before execution the target
  • Run the target
  • If the target ran successfully, put report message into result queue
  • If the target crashed, put an error reported into the result queue

Status report

class Status:
    worker: int
    job: int

Coverage report

class Report:
    worker: int
    job: int
    covered: list[tuple[str, str, int]]

Error report

class Error:
    worker: int
    job: int
    message: str

Parent process

  • The parent spawns and starts a configurable number of child processes and passes their worker ID, a command queue and a result queue
  • An object for each child process contains:
    • child process
  • In a loop, the parent
    • generates a new binary and associate it with a unique job ID
    • puts it into the job database under that id with status submitted and no worker id
    • submit it to the shared job queue
    • checks for response in the response queue
      • if response is an error: retrieve binary with corresponding job ID and store it into crash folder
      • if response is a status: update job with worker and timestamp in database
      • if response is a report: if provided coverage information increases total coverage store binary in samples
      • delete job from database
    • Check time stamps in job database
      • if timestamp of a submitted job is older than timeout, kill and restart the corresponding worker

Job

class Job:
   id: int
   data: bytes

Number of runs calculation seems off

The number of runs calculation on a 256 core systems seems much lower than expected, while the fuzzing results seem in accordance with the number of cores. Investigate.

Reduce found crash input

Try to shrink the input leading to a crash:

  1. Get shortest successful crash (if multiple)
  2. Apply a subset of mutations (those reducing size)
  3. If mutation is shorter and the same crash still happens, use the mutated input, otherwise pick new mutation

Strategies for (specific) mutations:

  • Remove a random line
  • Remove a random range of lines
  • Remove whitespace
  • Replace non-printable character by printable character
  • Remove characters between semicolons

Load crash coverage from crash dir on startup

To allow seamless restart of a fuzzing campaign, obtain coverage data for all samples found in the crash dir (if any) and add them to the coverage database without reporting them as a crash.

Handle crash results according crash location

Currently, when an example in the corpus exists that will likely trigger a crash, it will be modified, stored back into the corpus and likely trigger another crash at the same location. The more examples of the same type of issue end up in the corpus, the more likely it is to produce the same issue over and over again.

Solutions:

  • do not store subsequent occurrences in the corpus
  • alternatively: store them, but make selecting every different kind of examples equally likely
  • calculate a stable identifier from the crash location and put all examples under a corresponding directory in the crash directory

Update: May be as easy as checking the coverage for error in the coordinator process and only add / distribute a crash artifact iff the corresponding path has not been found in the coverage database.

Implement power schedule

Power schedule can be applied to the corpus (which input is selected for mutations) and on the mutation functions used (which mutations have been used):

  • length of seed
  • execution time
  • coverage increases

Cleanup of directory and existing state handling

The fuzzer does not seem to load previous state from the fuzzing directory. Nor does it load state from previous crashes directory.

Clean up the directory handling such that

  • An optional state directory file is passed to the fuzzer as separate argument
  • Known paths are loaded from state file on startup
  • If a state file was configured, the state is regularly saved to that file
  • The crashes directory is passed to the fuzzer as separate argument

Ref. #13
Related: #25

Use cryptographic randomness

Currently the fuzzer often creates bit-identical crashes. This should be next to impossible with proper randomness. Check all random sources to be truly random.

Re-add memory limit and deadlock detection

Idea: Create a shared memory area for each worker (shared with the coordination process). Before each execution, the worker stores the current binary into that shared memory area. The coordination tasks periodically check the liveness of each worker. When a configurable amount of time has passed without a status update, the worker is killed, the binary stored in the respective shared memory area stored as timeout and the worker is restarted.

TODO: Find a way to detect a memory exhaustion and handle it in a similar way (assuming the process gets killed when memory is exhausted). E.g. analyzing the return code for fatal error signals might be a solution.

Make crashes directory configurable

Currently, either an exact artifacts path (storing a single file) is configured or the default directory crashes is used for storing crash results. Rework this to allow for providing an alternative directory (also required for #11).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.