Giter VIP home page Giter VIP logo

Comments (7)

archer884 avatar archer884 commented on August 16, 2024

from harsh.

archer884 avatar archer884 commented on August 16, 2024

After digging into this a little bit, that's exactly what the library does, but that's also the exact purpose of guard characters. (Also, although the harsh guard implementation looks very different because it does not allocate an array, it appears to accomplish the same thing.) It looks like the difference between the harsh implementation and the original is that the original explicitly throws an error in the event that the input hash contains any non-alphabet characters even outside the guarded region.

Edit: No, that's not right. The characters in your test case ARE valid characters in the alphabet. Well. I'll keep poking at it. >.<

from harsh.

archer884 avatar archer884 commented on August 16, 2024

All right, here's what I found.

Your test case appends garbage to the rear of the hashid. Harsh is able to correctly decode this value by discarding the garbage data, where both hashids.ts and hashids.net will return an empty result instead. The reason that these other implementations (I have not examined any others) return an empty result is that they re-encode the decoded result and assert that it must equal the original value to be decoded. That is, ejRe1234 might be decoded as [12345] (it's not, but just for example), and the library will then attempt to re-encode [12345] and come up with jjjj as an answer. The resulting mismatch tells the library that it failed to decode the correct value.

As I said, harsh returns the correct value because it does not attempt to read from the garbage data but, instead, throws that away. I have not been able to think of a way that this actually causes harm, because it's not as though the added garbage is changing the value.

If you can come up with a test case where appending data outside the guarded region causes the value to be changed, I can see adding the fix for this, but the fix itself is very expensive. It roughly doubles the cost of decoding a hashid, which is already comparatively expensive to do (because the hashids algorithm involves, as far as I can tell, some unavoidable memory allocation). Obviously, some people may consider that to be worthwhile, but I don't know that I want to force it on people if they don't want to do it.

One possibility I thought of us the addition of some kind of convenience method that wraps the other method and performs the assert process on the result. Still not sure that's worth anything in terms of reliability, though.

I'd appreciate your thoughts in this matter.

from harsh.

sam701 avatar sam701 commented on August 16, 2024

I think it is already good as it is. It is just worth no know this behavior though. Definitely it would be great to have a kind of a checksum, but not necessarily on the costs of performance. I agree, re-encoding the decoded value in order to validate the encoded value is not really a solution but rather a workaround that can be implemented by the user, if needed, but not in the library.

I am not familiar with the hashids algorithm. It seems there is only a reference implementation without a formal specification. Would it be possible to return None in case of presence of the garbage data?

But if I understand correctly, it is not always possible to determine the garbage. Here is a test where garbage changes the value.

    let harsh = HarshBuilder::new().length(4).init().unwrap();
    let id = harsh.encode(&[1,2]).unwrap();
    println!("encoded: {}", &id);
    assert_eq!(harsh.decode(id + "12"), Some(vec![1,2]));

I would prefer to have None in this test rather than [1,2].

from harsh.

archer884 avatar archer884 commented on August 16, 2024

Thanks for your work on this, @sam701. You are correct that there is no spec, but only a reference implementation. It is possible to return None in the case of garbage data, but only by re-encoding as discussed above.

Given this latest test case, I think you're right that the only solution is to include that re-encoding process.

from harsh.

archer884 avatar archer884 commented on August 16, 2024

@sam701, I've implemented re-encoding. If the performance cost is concerning to you, I would recommend an alternative encoding mechanism.

https://crates.io/crates/crockford

Crockford encoding (named for its inventor, Doug Crockford) is far higher performance and tends to produce fairly human-friendly values. It does not provide salts, nor does it permit the encoding of multiple values into one string, but at least it's fast. :p

from harsh.

sam701 avatar sam701 commented on August 16, 2024

@archer884 thanks for the quick fix! I'd still prefer harsh to crockford because it provides obfuscation and makes it harder to predict next id (generated by postgres serial in my case). I suppose this performance penalty is acceptable in decode. A web server mostly performs batch encoding while sending lists of objects and only little decoding while receiving a few IDs in a request.

from harsh.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.