Giter VIP home page Giter VIP logo

Comments (3)

ia0 avatar ia0 commented on July 20, 2024

Thank you @robyoder for the report.

This is actually expected behavior. The check_trailing_bits permits to configure the behavior for the trailing bits of the last byte (for valid length inputs) and not the trailing bytes. There is currently no convenience configuration to accept inputs of invalid length. Maybe it would be useful to add one, please open an issue if you would benefit from such feature.

However there is a way to decode inputs of invalid lengths using the position of the Length error when calling decode (see documentation). This has very small overhead and can be done with a convenience truncate_decode function:

fn truncate_decode(encoding: &Encoding, input: &[u8]) -> Result<Vec<u8>, DecodeError> {
    match encoding.decode(input) {
        Err(DecodeError {
            position,
            kind: DecodeKind::Length,
        }) => encoding.decode(&input[..position]),
        output => output,
    }
}

Playground link

from data-encoding.

robyoder avatar robyoder commented on July 20, 2024

Sure, we can work around it and we will, but probably without running the decode function twice.

My question is this: what is the benefit of having check_trailing_bits for only the cases it currently covers? If some algorithm is generating base32 (or some other base) with trailing bits that are not zeros, isn't it plausible that it would also be generating base32 with trailing characters?

One such algorithm would be a naive secret generator that selects characters at random from the base32 alphabet. The final character in the string may represent 1-4 trailing bits, or a full 5 extra bits. I'm curious what kinds of algorithms check_trailing_bits is designed to accommodate if not one like this.

from data-encoding.

ia0 avatar ia0 commented on July 20, 2024

You're not running the decode function twice, you're just calling it twice on invalid input and one call returns immediately. The truncate_decode has very low overhead as I said (a few nanoseconds) because the length check is done before everything else. The content of the input is not read at all. This is actually one of the reasons why checking length is a different check than checking trailing bits. One doesn't look at the data while the other one does.

The customization provided by the library is meant to configure the implementation, not to adapt to use-cases. This is because the number of possible use-cases is unbounded, while the number of decisions taken in the implementation is finite. So the right design is the one which is currently taken by the library, which is to provide fine granularity control over the implementation. This is the job of the user to define and use the specification adapted to their use-case.

The current configurations have been added to permit to replicate the behavior of other implementations (like the GNU base64 and base32 programs) for compatibility reasons. I wasn't aware that some implementations accept invalid length and some users rely on that. I'll add this as a configuration option (see #29).

The algorithm you describe doesn't need this configuration to be efficiently implemented, because encode_len which returns the number of base32 characters you need to generate a given number of bytes doesn't return invalid lengths. Something like the following would work fine:

fn generate(base: &Encoding, len: usize) -> Vec<u8> {
    let input = vec![b'F'; base.encode_len(len)]; // Randomize content.
    base.decode(&input).unwrap()
}

Playground link

from data-encoding.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.