Giter VIP home page Giter VIP logo

Comments (2)

vitalybuka avatar vitalybuka commented on July 28, 2024

I assumed I solved the issue few month ago

parser.SetRecursionLimit(100);

Is possible that just 100 breaks your binary?

from libprotobuf-mutator.

dgg5503 avatar dgg5503 commented on July 28, 2024

Hi @vitalybuka,

Apologies for bumping a 5-year-old thread, however, I believe I've encountered an issue like this one. When using LPM integrated with libFuzzer, it is possible for LPM to generate a deeply nested message beyond the limits of the text or binary parsers. This is problematic in the following situation:

  • A .proto defines a data structure capable of recursively nested messages. For example, this stripped down .proto for a subset of JSON:
    syntax = "proto2";
    package data_structure;
    
    message Input {
      required Element element = 1;
    }
    
    message Value {
      oneof value {
        Array array = 1;
        /* other value types supported by JSON would live here */
      }
    }
    
    message Element {
      required Value value = 1;
    }
    
    message Array {
      repeated Element elements = 1;
    }
    
  • A fuzzer is created using the above structure with LPM.
  • After enough iterations, LPM begins to generate deeply nested messages (beyond 100+ levels deep).
  • Suppose one of these deeply nested messages signals new coverage to the fuzzing engine.
  • The fuzzer serializes then saves out the deeply nested message to the corpus.
  • The user restarts the fuzzer.
  • Corpus inputs are executed; however, the following error appears since one of those messages are beyond the parser's limit:
    Error parsing text-format data_structure.Input: 131:207: Message is too deep, the parser exceeded the configured recursion limit of 100.
    
  • Coverage is no longer the same after reading all inputs from the corpus since the above input failed to parse and therefore did not execute as per:
    #define DEFINE_TEST_ONE_PROTO_INPUT_IMPL(use_binary, Proto) \
    extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) { \
    using protobuf_mutator::libfuzzer::LoadProtoInput; \
    Proto input; \
    if (LoadProtoInput(use_binary, data, size, &input)) \
    TestOneProtoInput(input); \

In this situation, one may increase the recursion limit via SetRecursionLimit, however, even deeper messages are still capable of being generated and saved to disk, especially via crossover. Additionally, the fuzzer and possibly the function under test may end up spending more time deserializing, processing, & mutating deeply nested messages which could be better spent mutating fields at shallower levels.

With that said, is there any way to enforce a depth limit at the mutation level that I may have overlooked? In other words, during LPM mutation, if it detects that an add/clone/copy will put the mutated message over some user-provided maximum depth, can that be prevented in favor of shallower field mutation? I believe this would:

  • Prevent any chance of creating messages that are too deep to deserialize thus preserving coverage between fuzzer restarts
  • Prevent the need to periodically increase the parser's recursion limit
  • Possibly improve overall exec/s by limiting recursion depth during deserialization, fuzz target processing, and mutation

If there isn't already a way to enforce this, I'm curious what you'd think would be required to implement such a feature. I would be interested in providing a PR to add such functionality.

I've attached a simple example which demonstrates the scenario above.
issue-143-supplement.tar.gz

Thank you.

from libprotobuf-mutator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.