Giter VIP home page Giter VIP logo

Comments (60)

tarcieri avatar tarcieri commented on May 29, 2024 2

If you're using hashes like this in any sort of security context and have ambiguities in a structured hashing format, an attacker can exploit them to trick you into accepting documents containing e.g. nested content and have it verify as if it was the same as a non-nested message. Second preimage attacks on naive Merkle Trees are an example of this, and it's similar in concept to other attacks regarding confusion of message structure such as packet injection attacks.

If you want to build a naive hashing scheme using cryptographically secure hashes for your own purposes, that's fine, but I would be strongly opposed to the inclusion of naive schemes in cryptography libraries, because people generally expect cryptography libraries to be built for security purposes.

from hashes.

burdges avatar burdges commented on May 29, 2024 1

I'm tired right now, but I think argument against my alternative goes roughly : You should document your protocols and file formats better by using a more explicit serialization. That could just be a warning somewhere.

from hashes.

newpavlov avatar newpavlov commented on May 29, 2024 1

@burdges
There is two problems with using std::io::Write. First: there is no core::io, so it conflicts with the goal to support no_std whenever possible. Second: its methods return Result while hashes can process any data without an error, so using Write will result in a lot of useless unwraps.

I think the current digest_reader is more than enough for convenient interfacing with the Read implementors.

from hashes.

burdges avatar burdges commented on May 29, 2024

I'd consider this unwise based on previous discussions.

Right now, std::hash::Hasher does not address endianness, making it non-portable. It's okay for HashMap because you can only serialize HashMap as a series of tuples, not as the raw table. It's already not okay for more lossy probabilistic data structures, like a Cuckoo table or Bloom filter, that you likely serialize as the table itself. It's flat wrong for cryptographic hash functions that one uses almost exclusively in protocols or files.

Conversely, all these cryptographic hash functions would destroy the performance of HashMap if used there, so avoid doing that too. SipHasher is already designed to be a sweat spot between being fast and providing the sort of cryptographic assurances HashMap needs when under DoS attacks.

That said, one could provide a version using roughly the endian-hasher crate, unless its horribly broken.

use std::hash::{Hash, Hasher};
use endian_hasher::*;

struct DigestHasher<'a, T: 'a + ?Sized>(&'a mut T);

impl<'a, T: ?Sized + Input> Hasher for DigestHasher<'a, T> {
    fn finish(&self) -> u64 {  panic!();  }
    fn write(&mut self, bytes: &[u8]) {  self.0.process(bytes);  }
}

pub trait Input {
    /// Digest input data. This method can be called repeatedly
    /// for use with streaming messages.
    fn process(&mut self, input: &[u8]);

    /// Provide data from anything that implements `Hash`, attempting to converting embedded primitive numeric types to little endian.
    fn input_le<H: Hash>(&mut self, hashable: &H) {
        let mut s = HasherToLE(DigestHasher(self));
        hashable.hash(&mut s);
    }

    /// Provide data from anything that implements `Hash`, attempting to convert embedded primitive numeric types to big endian.
    fn input_be<H: Hash>(&mut self, hashable: &H) {
        let mut s = HasherToBE(DigestHasher(self));
        hashable.hash(&mut s);
    }

    /// Provide data from anything that implements `Hash`, leaving primitive numeric types in native endianness.
    ///
    /// Warning: Non-portable, do not use without first correcting the endianness of the input.
    fn input_native<H: Hash>(&mut self, hashable: &H) {
        let mut s = DigestHasher(self);
        hashable.hash(&mut s);
    }
}

As previously, we'd keep DigestHasher private to the digest crate so that it cannot be used with HashMap. And we'd make the finish method panic for good measure.

There is still an issue here in that we trust H: Hash to handle primitive numeric types by calling Hash instances recursively, which it might fail to do. We could lay the blame for such portability bugs with whoever wrote H: Hash perhaps, but they sound tricky to find.

I could hack up a pull request for this if @newpavlov thinks it wise. I think the first question is : Should the endian-hasher crate live or die?

from hashes.

tarcieri avatar tarcieri commented on May 29, 2024

Sorry to be this blunt, but this is a terrible idea.

The standard interface only allows 64bit outputs but there's nothing stopping extra outpust tailored to specific hashes

What? The trait definition stops this, because types. Nor is there any standard notion of how to map a cryptographic hash function's digest onto a 64-bit integer.

But beyond that, cryptographic hash functions utilized by std::hash::Hasher should be PRFs (i.e. keyed by a secret unguessable by an attacker), because hashDoS. There is no point in using a hash function which is both slow (because crypto) and not a PRF (because security).

The fundamental tradeoff is:

  1. A hash function that is fast as possible but doesn't hash attacker-controlled data. This omits every cryptographic hash function, because they are slow because they act as random oracles.
  2. A cryptographic PRF keyed by unguessable data to prevent hashDoS.

To me, if your goal is #2, anything slower than SipHash is unacceptable, because you still want a std::hash::Hasher to be as fast as possible given other constraints.

Do you disagree? Can you explain the use case where you want something that's simultaneously slow and insecure?

from hashes.

pedrocr avatar pedrocr commented on May 29, 2024

@burdges the endianess issues are indeed a problem. If there's no way to fix those it's unfortunate. I'm not sure if I understand why this happens. Are primitive types hashing by feeding the hasher one u8 at a time? Because the Hasher interface includes everything from u8 to u128 so it would seem that as long as Hash implementations use the full type it should be fine. Why doesn't that work?

I'll have a look if your suggestion works for my use case.

@tarcieri What? The trait definition stops this, because types. Nor is there any standard notion of how to map a cryptographic hash function's digest onto a 64-bit integer.

Nothing stops me from doing a .finish_256bit() and have that give me SHA256 of my struct. It's not part of the trait but I don't care I just want to be able to use the Hash API as the plumbing to get all the values to the Hasher.

You're assuming I want to use the normal Hasher API with SHA256 to use in a Hashmap or something like that. That would be a bad idea indeed, SipHash is strictly better in that application. What I need is a longer hash that I can use to uniquely identify a certain struct that I can use over time to identify that struct for caching and serialization. Think of how git uses hashes, content addresses, not how hash tables use hashes. Here's my current code:

https://github.com/pedrocr/rawloader/blob/c0a017590cc94d21416cbe4f4fe1fce84f4daaf6/src/imageops/mod.rs#L206-L225

I generate hashes that represent the state of the image pipeline up to that point as a function of it's settings. I'm currently using MetroHash but I want at least 128bit and a crypto hash for this. I'm only using the Hash trait for convenience of #derive(Hash) and friends. But maybe I just need a #derive(CryptoHash).

from hashes.

burdges avatar burdges commented on May 29, 2024

It worse so long as people use Hasher recursively, but if any impl of Hash casts a &[u64] to a &[u8] then it'll break. And someone might wind up doing this for performance if they mistakenly examine performance in unoptimized builds.

You can always locally employ a version of DigestHasher wrapper that makes a cryptographic hash function look like a Hasher, either with or without my endianness fix. I think the question is if my input_le and input_be functions make sense.

In fact, I doubt #[derive(Hash)] has stabilized field order, which provides a much better argument against adding my input_le and input_be functions to the Input trait. You could warn people that they should implement Hash themselves, not derive it.

I think those are the arguments against input_le and input_be : Bad impls for Hash break them. #[derive(Hash)] might not have stabilized field order. And you can document your protocol better by using a more explicit serialization. Iffy but not necessarily fatal. I donno.

from hashes.

pedrocr avatar pedrocr commented on May 29, 2024

@burdges It seems the standard implementation does just that:

https://github.com/rust-lang/rust/blob/c6afde6c46d167c9c75389b887f1d2498aeef3e4/src/libcore/hash/mod.rs#L506-L510

So it seems Hash is just too fundamentally broken for the serialization use case. Is there any derivable trait somewhere that can help with that? Something that just recursively hashes the underlying bytes of all types which a fixed endianness would be enough.

from hashes.

burdges avatar burdges commented on May 29, 2024

Yikes, I completely missed that! I'm tempted to raise that as a libs issue. In any case, I suppose serde should be considered the right way to do this for now.

from hashes.

pedrocr avatar pedrocr commented on May 29, 2024

@burdges That's very interesting how would you use serde for this? All my structs already implement Serialize and Deserialize. There's a standard way to just feed that to SHA256 without first going to YAML or something like that?

from hashes.

tarcieri avatar tarcieri commented on May 29, 2024

@pedrocr perhaps something like this is what you want:

https://github.com/cryptosphere/objecthash-rs

Note that I never wrote a procedural macro for it. I also intend on doing something slightly different and a bit more fleshed out soon.

from hashes.

pedrocr avatar pedrocr commented on May 29, 2024

@tarcieri yes, this is precisely my use case, thanks. Wouldn't it be easier to just write a serde Serializer that feeds a Digest? That way #derive(Serialize) is all that's needed and any crypto hash can be used?

from hashes.

tarcieri avatar tarcieri commented on May 29, 2024

@pedrocr it may be possible to finagle it through serde visitors.

I would advise against using it for now though. I'll have a better library out soon.

from hashes.

pedrocr avatar pedrocr commented on May 29, 2024

@tarcieri it seems objecthash solves a bigger problem of having redactable structs. For me that's not an issue so maybe just writing a Serializer is simpler.

from hashes.

burdges avatar burdges commented on May 29, 2024

Yes, even using the bicode crate still involves an allocation, but one could write a Serializer for digests to avoid allocation, maybe even one that agreed with the results of bincode.

As an aside, I'm pretty happy doing all my cryptographic hashing manually so far because frequently if I need a cryptographic hashes then it actually needs data from multiple sources or maybe should not cover all the data present in the struct provided. YMMV

from hashes.

tarcieri avatar tarcieri commented on May 29, 2024

@pedrocr as is, the Rust implementation doesn't support redaction, but yes, redaction/Merkle inclusion proofs are one of the nice features you get out of a hashing scheme like that.

from hashes.

pedrocr avatar pedrocr commented on May 29, 2024

@burdges cool, I'll have a look at this then. Would a PR with something like an adapter between Serializer and Digest make sense?

As for doing it manually I was hoping to do it automatically as these structs are specifically designed to be all the settings for the image operation and nothing more. We'll see :)

from hashes.

burdges avatar burdges commented on May 29, 2024

I found an easy way : You impl io::Write for a local wrapper struct Inputer<'a, I: 'a + ?Sized>(&'a mut I) where I: Input; and provide a method with a Serialize argument that calls bincode::serialize_into the Inputer wrapper.

It'll look vaguely like what I wrote above, but using io::Write instead of Hasher and serde::Serialize instead of Hash. Now serialize_into should avoid the Vec allocation in bincode::serialize.

Also, your original question could now be : Why not implement io::Write? Or could Input be replaced with io::Write even?

from hashes.

pedrocr avatar pedrocr commented on May 29, 2024

@burdges sounds good, avoids having to write a full Serializer. Does do more indirection though.

from hashes.

pedrocr avatar pedrocr commented on May 29, 2024

@burdges also is bincode guaranteed to always send the same bytes? Or can it do things like reorder fields between versions? It may be safer to just implement the Serializer to guarantee no funny business happens that invalidates the hash.

from hashes.

tarcieri avatar tarcieri commented on May 29, 2024

@pedrocr you seem to be running afoul of the canonicalization problem. One of the nice things about objecthash-like schemes is you don't need to canonicalize prior to hashing. Structurally similar objects will always have the same hash.

from hashes.

pedrocr avatar pedrocr commented on May 29, 2024

@tarcieri see this for my conclusions on the topic:

https://internals.rust-lang.org/t/f32-f64-should-implement-hash/5436/33

My worry in this case is only that the same exact struct will hash differently because bincode changes. Making sure that reordering fields doesn't change the hash for example isn't as important. Since my structs already implement Serialize using that would mean one less derive to worry about.

Having a crate that does derive and allows me to use any hash would be nice though so please do ping me once you have something to test. The other features can't hurt either.

from hashes.

burdges avatar burdges commented on May 29, 2024

Yes, serde explicitly supports non-self-describing formats, naming bincode as the example.

There are various instructions for impls of Serialize and Deserialize to support non-self-describing formats, so those can break bincode of course.

Also bincode never emits any thing about the fields besides lengths and contents. I'd imagine bincode would use a feature or a major version update if they broke binary comparability, but who knows.

from hashes.

pedrocr avatar pedrocr commented on May 29, 2024

@burdges so hashing bincode does seem fairly reasonable. Will have a look at that then.

from hashes.

newpavlov avatar newpavlov commented on May 29, 2024

@pedrocr
Sorry I couldn't join discussion earlier. As I understand, your problem can be solved by a separate crate (bincode + some glue code or using future tarcieri's crate) and you do not currently require any changes in the digest? In that case I think this issue can be closed.

from hashes.

burdges avatar burdges commented on May 29, 2024

There is a lingering question as to whether the Input trait should be replaced by std::io::Write or something like that. I haven't thought about it really myself.

from hashes.

pedrocr avatar pedrocr commented on May 29, 2024

It would also be nice if a Serializer was written so that one could hash straight from serde without using bincode. But that would just be a convenience that should probably be behind a feature flag.

from hashes.

tarcieri avatar tarcieri commented on May 29, 2024

Should this issue be closed? I think it's semantically incorrect to implement Hasher for this purpose, and there are plenty of other APIs more appropriate for hashing structured data in a cryptographically meaningful way.

from hashes.

pedrocr avatar pedrocr commented on May 29, 2024

@tarcieri the core of the issue is still unresolved. I've changed the title to reflect that. Right now the best way to use crypto hashes on structs is serde->bincode->Digest which is far from ideal.

from hashes.

newpavlov avatar newpavlov commented on May 29, 2024

@pedrocr
Am I correct that you want to add the following method to Digest trait or something similar?

fn digest_serialized<S: Serialize, F: Serializer>(s: S, f: F) -> Result<Output<Self::OutputSize>, F::Error>;

UPD: edited code a bit, removed bincode method

from hashes.

pedrocr avatar pedrocr commented on May 29, 2024

@newpavlov Don't know why you'd want the bincode one, the idea would be to not need bincode at all to be able to hash something that implements Serialize. I also don't think the first one is the correct one. From what I've seen what's needed is some sort of DigestSerializer that implements the Serializer trait from serde by feeding all the inputs into a Digest.

from hashes.

pedrocr avatar pedrocr commented on May 29, 2024

Also I'm not fully convinced the correct solution for this problem is with serde at all. There very likely cases where you want to serialize a field that you don't want to include in the hash and vice versa.

from hashes.

newpavlov avatar newpavlov commented on May 29, 2024

@pedrocr
To hash data you first need to convert it to the stream of bytes, i.e. serialize it, I don't think it will be right to choose or to develop serializator inside digest crate.

So I think it's better to start from the separate crate which will utilize Digest trait (or just one hash) and some serialization strategy (be it bincode or something hand-written) to implement digest version of Serializer.

I don't mind to have this crate as part of the RustCrypto project, but I'll not be able to work on this crate as currently I have other priorities for the project.

P.S.: After reading some docs it seems that code I've wrote in the previous comment will not work, as it's impossible to extract Writer from serializer and there is currently no traits for describing what we need.

from hashes.

tarcieri avatar tarcieri commented on May 29, 2024

As someone working specifically on structured hashing schemes, I'd also suggest trying to collaborate on ones which are reusable across multiple languages instead of making yet another one which is proprietary to Rust.

from hashes.

pedrocr avatar pedrocr commented on May 29, 2024

@newpavlov To hash data you first need to convert it to the stream of bytes, i.e. serialize it

Not exactly, a serialization requires the possibility to deserialize and thus includes more bytes than strictly needed. A Serializer that just feeds a Digest is by definition one-way and thus the only thing it needs to do is convert all fields to bytes in order and feed the hash, it doesn't have to do much serialization. But I agree that Serialize is the wrong abstraction for this as you will not always want to serialize the same fields as you want to hash and vice versa.

So I think it's better to start from the separate crate which will utilize Digest (or just one hash) and some serialization strategy (be it bincode or something hand-written) to implement digest version of Serializer.

Hashing the bincode serialization is trivial to do, that's what I've already done in my project. Publishing that as a mini-crate would be easy if there's interest.

@tarcieri As someone working specifically on structured hashing schemes, I'd also suggest trying to collaborate on ones which are reusable across multiple languages instead of making yet another one which is proprietary to Rust.

Structured hashing schemes are not the same thing as hashing structs. It would be nice if Hash was usable for hashing structs independently of structured hashing schemes. Since that is broken in multiple ways it would probably be nice if there was a usable #[derive(CryptoHash)] somewhere. But since serde->bincode->Digest works well enough it's probably what we'll end up with.

from hashes.

tarcieri avatar tarcieri commented on May 29, 2024

Structured hashing schemes are not the same thing as hashing structs.

"Recursively hashing structs" is pretty much the definition of a structured hashing scheme. However, you seem to be using it to mean "a content hash for serialized bincode" which, to me, is something very different from "recursively hashing structs"

It would be nice if Hash was usable for hashing structs independently of structured hashing schemes.

As covered earlier, this is simply the wrong tool for the job.

Since that is broken in multiple ways it would probably be nice if there was a usable #[derive(CryptoHash)] somewhere. But since serde->bincode->Digest works well enough it's probably what we'll end up with.

So you want to invent your own scheme, which is what I was recommending against. At least think about taking one of these two paths as you invent your own scheme:

  1. Canonicalization: how do you deal with hashing str and String? (i.e. do you apply Unicode canonicalization first?) Someone will want to hash types like HashMap eventually. Make sure there's a canonical representation.
  2. Structured hashing: nicely sidesteps the canonicalization problem while providing a bunch of added value at the cost of relatively little complexity. These schemes can be as fast or faster than canonicalization schemes, because they avoid serializing the data as an intermediate step.

Done correctly you can solve all of the problems I just mentioned and have a format that works across languages. Or you can invent your own thing that's specific to Rust, ala bincode, but please think through the issues that have been addressed in all of the content hashing schemes that have been created in the past.

You are walking well-worn ground here, where many mistakes have been made by past naive attempts and much hard-won knowledge has been accrued. Before you go invent something, please make sure you study the past.

from hashes.

pedrocr avatar pedrocr commented on May 29, 2024

@tarcieri "Recursively hashing structs" is pretty much the definition of a structured hashing scheme. However, you seem to be using it to mean "a content hash for serialized bincode" which, to me, is something very different from "recursively hashing structs"

I've stated multiple times I don't want to use bincode for anything and it's just a hack to get it to work right now. So no, I don't want to hash bincode, all I want is a recursive crypto hash of arbitrary rust structs. I don't want to make any canonical representations of data. I don't want to serialize and then hash. I don't want to invent a hashing scheme, just have a hashing trait that isn't broken like the default Hash.

from hashes.

tarcieri avatar tarcieri commented on May 29, 2024

all I want is a recursive crypto hash of arbitrary rust structs

I don't want to serialize and then hash

Then, by definition, you need an algorithm that knows how to compute cryptographic hashes of structured data.

Better make sure it isn't vulnerable to second-preimage attacks.

from hashes.

pedrocr avatar pedrocr commented on May 29, 2024

@tarcieri as far as I can tell that only applies if you are hashing individual structs and then hashing the concatenated hashes. I don't want that at all. I just want to initialize the hash, pass in all the values recursively, and finalize the hash. This has the same properties as serializing and then hashing without the performance penalty.

from hashes.

tarcieri avatar tarcieri commented on May 29, 2024

You need to domain separate the values somehow, either by length or by structure, and in either case you'll want to encode the types. The former pretty much involves a bincode-like scheme, and the latter an objecthash one.

Otherwise, you'll have ambiguities where different structures compute the same hash.

from hashes.

pedrocr avatar pedrocr commented on May 29, 2024

You need to domain separate the values somehow, either by length or by structure, and in either case you'll want to encode the types.

None of these are an issue for me.

Otherwise, you'll have ambiguities where different structures compute the same hash.

That's how Hash already works and is fine for me.

from hashes.

tarcieri avatar tarcieri commented on May 29, 2024

Ok, so it sounds like you want to abandon security... in which case why are you using a cryptographic hash?

from hashes.

pedrocr avatar pedrocr commented on May 29, 2024

@tarcieri none of these abandon security but even if they did I'm using hashes for caching. All I need is a statistical guarantee of non-collision absent an attacker.

from hashes.

pedrocr avatar pedrocr commented on May 29, 2024

@tarcieri I suggest you bring this up with the rust lib developers then. All I want is to get a hash calculated the exact same way Hash already works. I only want a crypto hash because 64bits are not enough collision protection for the application. If there's a way to create collisions easily with that kind of interface rust in general is vulnerable to them and that can be exploited to DoS systems at the very least. You haven't given an example of how that would happen though. All the cases you've pointed out are hashes of hashes situations.

from hashes.

tarcieri avatar tarcieri commented on May 29, 2024

I suggest you bring this up with the rust lib developers then.

Hash isn't intended to be a cryptographic hash for identifying content. It's for use in hash-like data structures, where collisions are easily tolerated by the underlying algorithms. The main threat to those is hashDoS, where an attacker can deliberately exploit repeated collisions to the same value. The Rust core developers addressed those by using SipHash, a PRF with a random key.

You're talking about creating a CryptoHash trait, which to me would ostensibly be for the purposes of secure content authentication. If you don't want it to be about security, I would suggest taking "crypto" out of the name.

You haven't given an example of how that would happen though.

These same attacks can happen without hashes-of-hashes. As I pointed out the general attack is one of failure to disambiguate message structures. Packet injection is the same basic idea, without any hashing involved whatsoever. That's a parsing ambiguity, but the same concept applies: the security system is confused about the message structure, allowing an attacker to trick it into doing something it shouldn't.

In my opinion if any messages with different structures can realistically produce the same content hash, the scheme is broken from a security perspective.

from hashes.

pedrocr avatar pedrocr commented on May 29, 2024

The Rust core developers addressed those by using SipHash, a PRF with a random key.

From what I've seen the current implementation doesn't actually randomize the key. But even if it did that's not relevant if it's trivial to generate a collision by just manipulating the structure.

These same attacks can happen without hashes-of-hashes. (...) In my opinion if any messages with different structures can realistically produce the same content hash, the scheme is broken from a security perspective.

I'd agree I just haven't found a case where that can happen with a setup like that of Hash. I'm curious to see one as I believe it would be a bug in Hash as well (or in some implementations of the trait).

from hashes.

tarcieri avatar tarcieri commented on May 29, 2024

From what I've seen the current implementation doesn't actually randomize the key.

Hash is just a trait. Some implementations defend against hashDoS and some don't.

According to the documentation, the implementation used by HashMap specifically implements the randomly-keyed cryptographic hash strategy (and uses std::hash::SipHasher I believe):

https://doc.rust-lang.org/std/collections/struct.HashMap.html

By default, HashMap uses a hashing algorithm selected to provide resistance against HashDoS attacks. The algorithm is randomly seeded, and a reasonable best-effort is made to generate this seed from a high quality, secure source of randomness provided by the host without blocking the program

from hashes.

pedrocr avatar pedrocr commented on May 29, 2024

At least on my machine the hash is not seeded. This always returns the same value:

https://gist.github.com/pedrocr/8d0283bed3f56e5cb6fb9fe0a785f947

But that's not relevant to the point as I specifically mentioned. If you can generate structs that have the same hashing because you manipulate the structure it doesn't matter that your hash function is randomly keyed. Within the same execution of the program those structs will colide, just with different values. And that's what would be the bug in Hash.

from hashes.

tarcieri avatar tarcieri commented on May 29, 2024

At least on my machine the hash is not seeded.

Use a HashMap and I'll expect you'll see different results. Hash itself is NOT designed to be a security primitive.

And to reiterate yet again, in the context where Hash security matters (i.e. HashMap and SipHasher), we're operating under a very different threat model where collisions are tolerated. Hash-like data structures naturally accommodate collisions, but are susceptible to an algorithmic attack if they rebalance to the same value repeatedly.

You're colluding threat models here: Hash and CryptoHash operate under different threat models.

In the one I would find acceptable for CryptoHash, collisions for attacker-controlled documents are not acceptable.

I'm not sure this is continuing to be a productive conversation: I would like the threat of ambiguous messages to be solved strategically for anything named CryptoHash. This is a well-studied problem in cryptography with a limited number of solutions.

I would find a hand-rolled naive and ambiguous scheme unacceptable. You haven't made a concrete proposal, but you're downplaying everything you need to do to solve these problems from a security perspective, and that's why I'm worried.

from hashes.

pedrocr avatar pedrocr commented on May 29, 2024

Hash-like data structures naturally accommodate collisions, but are susceptible to an algorithmic attack if they rebalance to the same value repeatedly.

And that's the attack that if it works with CryptoHash will also break Hash. HashMap tolerates collisions is they are fully random. After all it only has 64 bits of key. But if it's possible to generate a collision by just manipulating structure Hash and CryptoHash are equally broken.

I would find a hand-rolled naive and ambiguous scheme unacceptable. You haven't made a concrete proposal, but you're downplaying everything you need to do to solve these problems from a security perspective, and that's why I'm worried.

My proposal is very simple. Do exactly what Hash does but feed it to a crypto hash. I've yet to see an attack that breaks that but it may exist. If it does Hash should probably be fixed as well. So far the only case I've seen that's strange is that hashing a zero sized struct is a no-op. Not even that is exploitable in rust as far as I can tell. Other than that things like ("foo","bar") and ("foobar","") hash to different values already. Do you have an example where this breaks down?

from hashes.

tarcieri avatar tarcieri commented on May 29, 2024

But if it's possible to generate a collision by just manipulating structure Hash and CryptoHash are equally broken.

Incorrect. Again, hashDoS is an algorithmic attack that relies on the attacker being able to find large numbers of collisions. Finding a single collision is not catastrophic, because it doesn't help the attacker that much.

Furthermore, the interesting cases for hashDoS are HashMap<T, _> for a single type T. For typical Hash cases we don't need to worry about collisions across types.

With a CryptoHash, which calculates a content hash, any single collision is catastrophic, and we aren't constrained by T and have to worry about collisions across types.

These are totally different threat models you are trying to collude. Hash is not designed to solve the problem you're proposing.

My proposal is very simple. Do exactly what Hash does but feed it to a crypto hash. I've yet to see an attack that breaks that but it may exist.

The onus is on you to show why your scheme is secure. Otherwise you're shifting the burden of proof. Your (ill-defined) scheme is not secure simply because I haven't found an attack. You need to be able to answer questions like: how is the scheme domain separated everywhere type-by-type? How does it distinguish a Vec<u8> containing the data that would be hashed for some struct from the struct itself?

A secure scheme will be demonstrably unambiguous and free of collisions for arbitrarily structured messages. Anything less is insecure.

This is a much higher bar than Hash.

from hashes.

pedrocr avatar pedrocr commented on May 29, 2024

With a CryptoHash, which calculates a content hash, any single collision is catastrophic, and we aren't constrained by T and have to worry about collisions across types.

If you care about collisions across types then yes, it's easy to generate collisions that are benign in Hash but break CryptoHash. Again, not my use case, but I can see how you'd want that in general and if there's a simple way to do that then great. objecthash doesn't currently do that completely though as it hashes the same structure of fields to the same value independently of type. So objecthash(Person{id:0}) == objecthash(Dog{id:0}). Generating extra colisions with Hash isn't trivial actually but can be done with things like hash((0u16,0u16)) == hash((0u8,0u8,0u8,0u8)). That's just a trivial bug (i.e., since Hash doesn't care about different types it doesn't do the same as with Vec and also hash the length). But if there's a standard way to fix that all the better. I've had a second look at objecthash and opened a bug report on it.

from hashes.

pedrocr avatar pedrocr commented on May 29, 2024

After looking at objecthash what it does is create a hash that's interoperable between languages and thus has some situations that generate collisions on purpose:

  • First{id: 0} and Second{id: 0} will hash to the same value even if one uses u32 and the other u64.
  • Different unicode strings that normalize to the same value have the same hash
  • Structs that have the same fields but in different order will hash to the same value

This makes it easier to have values for which objecthash(a) == objecthash(b) and yet myfunc(a) != myfunc(b). For some applications this is not ideal. Given this I'd say it would make sense to have a simpler scheme that doesn't have the issues of Hash (e.g., tuples hash the length) but just does the hashing of all fields in order with all the contents.

from hashes.

Trojan295 avatar Trojan295 commented on May 29, 2024

hash((0u16,0u16)) == hash((0u8,0u8,0u8,0u8))

I would argue, if it's a bug, cause the byte representation of the data is the same. Hash functions work on bytes, not data structures.
Let's say you have a vector in Rust and in Java, which are holding exactly the same data. A hash of those should be the same in both languages, even that the underlying implementation of vectors could be different.
If we couple a hash function with the implementation of the data structure in Rust, we could make, that this hash would be usable only in Rust. What's even worse, changes to the compiler could affect the resulting hashes.

from hashes.

pedrocr avatar pedrocr commented on May 29, 2024

@Trojan295 I'm not sure if you're arguing that it's a bug or not. (0u16, 0u16) and (0u8, 0u8, 0u8, 0u8) are not holding the same data, one has 2 zeroes the other 4. The fact that end generating 4 zero bytes in memory in both cases is an implementation detail. And rust already does hash([0u16, 0u16]) != hash([0u8, 0u8, 0u8, 0u8]) because the Vec length is already hashes. Only the tuple length isn't.

from hashes.

Trojan295 avatar Trojan295 commented on May 29, 2024

Are you trying to use those structured hashing functions only for internal Rust purposes like HashMap or do you want to make them usable in cryptographic manner? For such internal use that ok, but hashes used in cryptography are mostly send to other entities. Now if you would define a custom data structure in Rust and apply some hash on it, then the other entity would need to know how did you calculate the hash of this structure.

Let's take this Vec. Don't know how it's done in Rust, but let's assume that the length is appended to the data and hashed. Then the guy on the other side of the wire needs to know, how you build the byte stream, that was hashed (so that you appended the length, and not for ex. prepended). It's not simple to unify this across multiple parties.

Generally, I don't think such feature is required in case cryptographic hashes, as in case of crypto you mostly operate on numbers/bytes and not custom data structures. The way of hashing needs to be well known and unified.

from hashes.

tarcieri avatar tarcieri commented on May 29, 2024

@Trojan295 short of Hash changing to a const-dependent pi-typed interface, this wouldn't work for things like HashMap, which needs a different type signature.

But as I've previously stated, and if it's the thing I have a bug up my butt about, it's conflating security domains and concerns, and that's really what I want to avoid.

I am a huge fan of something like a CryptoHash scheme and have already implemented one in Rust and plan on designing and implementing another. I just want to make sure it covers all of the concerns I have. Those concerns are orthogonal to what Hash presently provides.

from hashes.

newpavlov avatar newpavlov commented on May 29, 2024

I will close this issue and will create a separate issue in the RustCrypto/utils repository. Crate for cryptographically hashing structs would be a great addition to the project, but as I stated earlier it's better to do as a separate crate. I think concerns raised by @tarcieri are good ones and should be addressed in this crate.

from hashes.

burdges avatar burdges commented on May 29, 2024

I think a utility crate could provide a wrapper type struct HashWriter<'a, I: 'a + ?Sized + Input>(&'a mut I) that impls io:Write along with input_le, input_be, etc. We cannot make it totally generic over serde serializers anyways because most lack bincode::serialize_into, but folks can work around that on a case by case basis.

from hashes.

newpavlov avatar newpavlov commented on May 29, 2024

@burdges
Could you please specify for that use cases io::Write impl is needed and current digest_reader is not enough?

from hashes.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.