dalek-cryptography / curve25519-dalek Goto Github PK

View Code? Open in Web Editor NEW

837.0 837.0 401.0 4.2 MB

A pure-Rust implementation of group operations on Ristretto and Curve25519

License: Other

Rust 99.52% Makefile 0.02% Shell 0.27% HTML 0.19%

cryptography curve25519 edwards-curve elliptic-curves montgomery ristretto

curve25519-dalek's People

Contributors

Stargazers

Watchers

Forkers

hdevalence xorxornop graydon tarcieri burdges reedd fisiognomico samdney masonhemmel lparth j08ny manishearth giuliabr unlawfulmonad danrobinson rozbb samscott89 deepinthebuild floodyberry dantengsky chritchens jeandudey isislovecruft oleganza mandragorian ebfull avahowell pelian dindinw dandcg debugsteven newpavlov kimwonlee tomaka jedisct1 isgasho georgio str4d dpc xoloki markblunk recmo mati865 huitseeker zone117x miketang84 bwesterb 3for vks dsprenkels zoep pratyush sunhuachuang novifinancial scs dendisuhubdy cperezz dvc94ch kchalkias zondax dusk-network vlopes11 cbeck88 stjordanis rob-solana raytlty fiono11 gedigi tranminhkhoa 5l1v3r1 zhilangtaosha akropolisio phreaknik suyash67 mfaulk rubdos hansieodendaal ryankurte zeta1999 wyatt-howe betrusted-io swinebanker bunnie galoisinc srinathsetty antoniokdacunha garious markuszoppelt thienpow stephaneyfx zengo-x polymeshassociation rooooooooob vmarkushin ktmlm cyjseagull paulgrandperrin mattiasbergstrom tumtum23 cargodog

curve25519-dalek's Issues

Refactor Scalar API

The Scalar API and documentation is sort of muddled regarding what a scalar is supposed to be (bytes? integer? integer mod l?). Fix the Scalar API and documentation to make it clear.

Meet undocumented scalar contract in current Karatsuba code

From #78, I merged @floodyberry's Karatsuba code, which appears to be correct. However, there is a general issue with a leaky abstraction between curve25519 and the specification for ed25519 signatures which I hadn't noticed until now: ed25519 expects to be able to take any 256-bit integer and use it as a curve25519 scalar (in ℤ/ℓℤ), whereas curve25519 doesn't specify what a scalar's bytes are allowed to be. In my opinion, the solution to this is to:

document the contractual behaviour, as expected by ed25519/RFC8032
hold to the contractual behaviour by masking the highest three bits in a newly initialised scalar

Due to this issue, signing and verification with ed25519-dalek was failing, so I reverted the merge of #78. Once we mask the bits and add some documentation, we can re-merge @floodyberry's Karatsuba arithmetic.

@floodyberry, any thoughts/comments?

Are there any performance bugs that prevent the usage of shuffle! in the AVX2 backend?

I just read a blog post about this library and was wondering why does the AVX2 backend do not use the shuffle! macro instead of the many other intrinsics for re-ordering vector elements.

If there are any performance issues with it, it would really help if bugs could be filled in packed_simd upstream.

Our fuzzers broke with the latest external API changes

We changed enough of our external API in the last version (possibly earlier) that our fuzzers are broken. The fuzzer for DecafPoints, now RistrettoPoints, I believe cannot be usefully recovered since it expects to directly manipulate FieldElements in an unsound way which we've now made impossible to external applications by making them pub(crate). (It's a little annoying that we can only fuzz the external API.)

For the case of scalar_constructor_accepts_256bit_values:

we don't accept 256-bit values anymore

we do accept values in (2²⁵²-27742317777372353535851937790883648493,2²⁵⁵-1] but it should be pretty difficult to construct one of these and use it mistakenly

So it doesn't seem all that useful because I don't see anyway this could not be the case, but we could change the scalar fuzzer to check that:

    // Compute c = a*b (mod l)
    let c1 = (&Scalar::from_bits(a_bytes) * &Scalar::from_bits(b_bytes)).reduce();

    // Compute c = (a mod l) * (b mod l)
    let a_mod_l = Scalar::from_bytes_mod_order(a_bytes);
    let b_mod_l = Scalar::from_bytes_mod_order(b_bytes);
    let c2 = &a_mod_l * &b_mod_l;

    assert_eq!(c1, c2);

Constant time crate

There is a nice crate consistenttime with various constant time integer operations, btw.

Multiscalar multiplication with precomputation

The current multiscalar API looks like:

pub fn multiscalar_mul<I, J>(scalars: I, points: J) -> RistrettoPoint
where
    I: IntoIterator,
    I::Item: Borrow<Scalar>,
    J: IntoIterator,
    J::Item: Borrow<RistrettoPoint>,

This takes an iterator of scalars and an iterator of points, and computes Q = c_1 P_1 + \cdots + c_n P_n. This is super-flexible and very useful (it makes it easy to combine statements), but it doesn't allow precomputation.

To do that, we could create a struct (not sure about naming, let's say MultiscalarPrecomputation for now) that looks like:

struct MultiscalarPrecomputation {
    /// Takes points B_1, ..., B_m
    pub fn new<I>(points: I) -> Self
    where
        I: IntoIterator,
        I::Item: Borrow<Scalar>,
    { ... }

    /// Computes (a_1*A_1 + ... + a_n *A_n) + (b_1*B_1 + ... + b_m * B_n)
    pub fn multiscalar_mul<I,J,K>(
        dynamic_scalars: I, 
        dynamic_points: J,
        static_scalars: K,
    ) -> RistrettoPoint
    where
        I: IntoIterator,
        I::Item: Borrow<Scalar>,
        J: IntoIterator,
        J::Item: Borrow<RistrettoPoint>,
        K: IntoIterator,
        K::Item: Borrow<Scalar>,
    { ... }
}

This API would let us cover every combination of static/dynamic points, and because the precomputation is opaque, it means that we can pick what kind of static data we want to use depending on the backend implementation. This would supersede #79.

Add unit tests for precomputed lookup tables

There should be tests in the constants module that the lookup tables (whose values were copied from Adam's code) were generated correctly. Fixed in https://github.com/isislovecruft/curve25519-dalek/pull/14

C bindings?

Ristretto is a great replacement for P-256 and it would be super cool to be able to use it in C which would automatically expand it to Go/C#/Java, even JS.
There are some password protecting protocols like Sphinx or the new PHE that would benefit from moving to 25519 instead of NIST.

Would you consider adding build instructions or standard bindings for languages other than Rust?

Move to `rand` 0.5 (breaking change)

The rand API was reworked: see https://github.com/rust-lang-nursery/rand/blob/master/UPDATING.md

We should upgrade to use it. It has a bunch of nice stuff, like a CryptoRng marker trait.

Please document what constant-time guarantees this crate provides

Since this crate provides pure-Rust implementations of cryptographic primitives, the very first question I was trying to find the answer to in the README was whether it had appropriate protection against timing attacks, via constant-time primitives. (And, if so, how it managed to do so using pure Rust code.)

After some digging, it looks like this crate does use appropriate constant-time operations through the subtle crate. That crate, in turn, mentions the need to use the nightly feature for full protection.

Could you please document, in the README for both this crate and the ed25519/x25519 crates (either directly or via a link), the timing-attack resistance and constant-time properties, and in particular the requirement to use the nightly feature to fully achieve those properties?

Thank you for your work on these crates; much appreciated.

Failures on decompression of `CompressedEdwardsY` when using externally generated keys

In relation to #69 where I'm trying to decompress points created by the ring lib, I get intermittent failures. Some of ring's keys decompress, and some don't. I'm not sure sure if this means I'm doing something wrong or if this lib is doing something wrong. This gist has a minimal version that demonstrates that at least something isn't right.

AVX2 backend build is broken on recent nightlies

The AVX2 backend uses a mix of "portable" vector operations and AVX2 intrinsics. Since the intrinsics work on bag-of-bits types (__m256i) while the Rust SIMD code uses finer types (u32x8), there's conversions, like

t0.0[i] = t0.0[i] + _mm256_blend_epi32(zero, S2.into_bits(), 0b10100101).into_bits();

(yes this is ugly)

Recently something changed to cause the type inference to fail here. I think it may have been this change but I'm not sure (it also doesn't particularly matter).

Until the AVX2 code is patched up, pinning nightly-2018-04-16 in rust-toolchain works.

Implement Elligator 2 (hashing to a curve point)

Stub issue for tracking Elligator 2 implementation.

Improve Scalar API

The Scalar API should be reworked to make it better for general-purpose use. This issue is for tracking those changes (and figuring out what they should be). For instance, it doesn't currently implement Add or Mul traits, and just has a single multiply_add function.

In my feature branch https://github.com/hdevalence/curve25519-dalek/tree/feature/split-scalar-type I split code off of the Scalar type into an UnpackedScalar type. There's some issue here about whether to represent a Scalar as bytes (allowing easy access to the bits for scalar multiplication) or as limbs (for arithmetic). I don't know which is the better thing to optimize for in a general-purpose library.

For now, maybe we should have Scalar implement arithmetic by unpacking, calling an impl in UnpackedScalar, and then repacking? That way, users who just want to implement something can use Scalars and have it work, but people can use UnpackedScalar to avoid overhead if they want to. (This shouldn't be a bottleneck, so it's not really that important -- whatever is simplest and cleanest is best).

Comments?

Build failure on `nightly-2018-05-03` with `serde` enabled

The build fails with nightly-2018-05-03 due to cargo changes (see rust-lang/cargo#5475 ).

Until this is fixed (here or upstream), pinning to nightly-2018-04-30 works.

Rename {PreComputedPoint, CachedPoint} to {AffineNielsPoint, ProjectiveNielsPoint} respectively

The point format used in the original ed25519 reference implementation, as well as in Mike Hamburgs curve25519 implementation, is called niels_t after Niels Duif, a graduate student who contributed to the ed25519 paper. However, in the paper, this point representation is not attributed to him; the points are called "precomputed points" and "cached points" respectively.

Hence, in our code, we have PreComputedPoint and CachedPoint. Not only is the naming non-explicit and non-intuitive, it also isn't giving credit to Niels' contributions. @hdevalence and I firmly believe in giving credit where credit is due, especially in cases where the contributor has perhaps a less well-known reputation. It's really important to name the people who do the real work.

Therefore, normally I propose renaming PreComputedPoint to AffineNielsPoint, and CachedPoint to ProjectiveNielsPoint.

Traits for multiscalar multiplication

Split from #125: without settling on the design of the precomputation API, I think it makes sense to refactor the multiscalar functions to be traits implemented on the point types. Some of this is already done in the precomputation branch and could be extracted.

One question that would be worth thinking about is whether we want to stay with static dispatch for the iterators or to use dynamic dispatch. The downside of static dispatch is that it bloats the code size (maybe this doesn't matter) and that it prevents the multiscalar trait from being object-safe (which would prevent making opaque precomputation structs).

Accelerate scalar inversion

The Back-Maxwell rangeproof construction we use in dalek-rangeproofs requires scalar inversion. Right now we just have an easy and naive implementation with no optimization, but which is nearly as expensive as a full point*scalar operation.

It could be made faster by some combination of:

using fixed-window exponentiation with a lookup table (see the constant-time scalar-point multiplication for an example)
adding an implementation of squaring for scalars
optimizing the scalar arithmetic: maybe Karatsuba, Montgomery reduction, or some combination?
maybe 64-bit implementation of scalar arithmetic ?

Refactor 32-bit field arithmetic implementation

The code is currently messy, ugly, copy-pasta from the ref10 implementation. It could be made much cleaner and easier to understand.

#![no_std] compatibility

I'm interested in using this crate in an embedded system where I'm compiling with #![no_std]. Skimming over the code it looks like supporting #![no_std] should be very simple: mostly just s/std/core/ and I think you're nearly there.

Are you interested in supporting #![no_std] usage? If so, I'd be happy to send a PR.

Request: Add FieldElement::as_bytes() function

I'm currently trying to get ring's Ed25519 public keys to interface correctly with some system that uses Java/BouncyCastle. For the keys to have the correct format for the SubjectPublicKeyInfo with OID 1.2.840.10045.2.1, I need to append the X and Y coordinates to each other. It would be best to use the CompressedEdwardsY::decompress function to give me an ExtendedPoint where I can extract the values I need. Currently the solution would just be to copy/paste the current decompress function and replace the last line with what I need.

Anyway.

Could a function be added that gives me access to the raw bytes of a FieldElement?

Scalar::random shouldn't require std anymore

A robigalia (https://robigalia.org/) developer reported to me via IRC that they were confused why the Scalar::random function required std. I know it used to be the case that the rand traits we were using required std, but I believe this is no longer the case. I'll try removing it and see what happens.

Operators only implemented on references

I've noticed that operators on ExtendedPoint and the like are only implemented on &'a ExtendedPoint. Is there any reason for a PR that also implemented them for the concrete types wouldn't be accepted? The types are already copy and as far as I know the compile is better at reasoning about the code when there isn't explicit memory (i.e. a reference) involved.

Scalar group order

The documentation says Scalar is an element of ℤ/lℤ, which should be multiplied by a generator of order l. Yet, curve25519 typically uses scalars from ℤ/8lℤ, and makes the key creator responsible for clearing the bottom four bits, so that it does not matter if the generator is chosen properly. I have not looked into it deeply but elsewhere the code gives indications that this is the approach here as well. If so, maybe worth tweaking the documentation.

Iterators should implement `Sum` and `Product`

Rust defines Sum and Product traits that allow sums and products of iterators.

I think it makes sense to implement:

impl Sum for RistrettoPoint
impl Sum for EdwardsPoint
impl Sum for Scalar
impl Product for Scalar

These can be implemented using fold internally.

Pippenger multiscalar multiplication

We should implement Pippenger's algorithm for large multiscalar multiplications.

@oleganza already did some work on this in #129. That work should probably be rebased onto #139 when #139 is done, and then reviewed.

implement NEON backend

Finish Ristretto implementation

X25519 functions

It would be great to have this library (or possibly a x25519-dalek similar to ed25519-dalek, to keep this one low-level focused) provide the higher-level DH functions curve25519() and curve25519_base() functions found in donna, so people like myself don't find unique and beautiful ways to royally fudge up the scalar multiplication on their own.

AVX2 build broken due to deletion of `core::simd` upstream

The core::simd module was deleted last week (discussion: rust-lang/stdarch#525), so the AVX2 backend won't build any more.

For now it's possible to fix the build by setting last week's nightly in rust-toolchain:

echo "nightly-2018-07-19" > rust-toolchain

Use the u64 backend by default on x86_64

(Branching off of this comment since I think this may be off-topic there)

curve25519-dalek has two field arithmetic implementations, a u32 implementation (using u64s for products) and a u64 implementation (using u128s for products). The u64 backend is selected by the radix_51 feature, which is selected by the nightly feature. Now that u128s are stabilizing (tracking issue), u128s aren't nightly-only. We should:

remove the #![feature(i128_type)] lines;
unlink radix_51 from nightly;
make radix_51 the default feature on x86_64.

The last point seems tricky / impossible, since target-specific or target-default features are currently an open issue in Cargo.

Features are selected before any code is built, so it's not possible to use architecture-dependent #[cfg]s to choose a feature.

One hack (mentioned in the Cargo issue and in the u128 tracking issue) is to use a #[cfg]s in a build script to emit feature selections for the main crate. But this doesn't work for us, because we use the build script to generate constants, so by the time we're in build.rs we need to already know which backend to use.

Another approach is to select the backend not by a feature but directly by architecture, or to always use u128s, as in zkcrypto/pairing#80 .

However, the problem is that just because u128s are supported by a target's backend (1) or even lowered to instructions that do a 64 x 64 -> 128 bit multiplication (2) doesn't mean it's better to use them. For (1), if a 64-bit multiplication is going to be emulated by 32-bit multiplications anyways, it would be better just to use them directly, and as an example of (2), aarch64 has instructions for computing high and low parts of a 128-bit product, but they're not necessarily fast. (This is considering only speed and not safety, like whether emulated multiplications are constant-time).

Rename `util` module to `subtle` to make it clear that it's for CT utilities

I'm going to rename util to subtle, then make a new module called utils and put load3 and load4 in there. This is a breaking API change, and I'm sorry.

Bump byteorder version?

We may need/want to bump the byteorder version to avoid this problem with #[feature]s, pointed out by @ruuda: 1e74cb3#r29958498

Improve our #brand

Okay, so, bear with me, as this might seem frivolous (and it is): we should have a logo for *-dalek stuff. I'm thinking a simplified line drawing of a dalek with edwards curves as sparkles coming out of its radar-schnozzley blaster thingies.

Anyone have any better ideas?

the alloc feature currently fails to compile

This is due to an erroneous (and extraneous) import in src/edwards.rs.

Some "as" cast replacements

I suggest to replace the code pattern "as u128" with "u128::from()".

So in the field_64bit.rs file code like:

fn m(x: u64, y: u64) -> u128 { (x as u128) * (y as u128) }

Becomes:

fn m(x: u64, y: u64) -> u128 { u128::from(x) * u128::from(y) }

The point of this apparently pointless change is that "as" casts are bug-prone, while from() is reliable. In general I suggest to minimize the number of "as" in a Rust program.

Improve usability of imports in the `curve25519-dalek` API

Hi!

Thanks for this clean library. I like this so far.

One thing I am wondering about:

use curve25519_dalek::{constants, edwards, scalar};
// [snip]
struct Foo {
    r: scalar::Scalar,
    p: edwards::EdwardsPoint,
}

since most (all?) structs basically have their module name in their own name (edwards -> EdwardsPoint, scalar -> Scalar, ...), would it make sense to make some sort of prelude module, such that I could do

use curve25519_dalek::edwards::prelude::*;
// [snip]
struct Foo {
    s: Scalar,
    p: EdwardsPoint,
}

? prelude can import from the scalar and edwards modules.

Just searching for a way to reduce this kind of duplication a bit. What do you guys think?

Implement the remaining functionality needed for (V)XEdDSA

Specifically, the functionality here which we don't already have: https://github.com/WhisperSystems/curve25519-java/tree/master/android/jni/ed25519/additions

I'd prefer to do this all at once (as in, all in one minor version), show it to Trevor, and then (if he approves) merge it in, rather than implement incrementally over time.

Validate points

It's maybe worth providing, or recommending, some side function to validate points. I donno if this should just rule out obviously bad points, or maybe say (6..246).contains( compressed_point.iter().map(|x| x.count_ones()).sum() ), or even a more interesting randomness test based on FFTs.

Implement reduce from a 32-byte / 256-bit scalar mod l.

I need a function to convert a 32-byte input to an scalar

Does not compile with --no-default-features

I tried to compile without default features because I was getting a dependency conflict with rand, but there were pages and pages of compile errors.

Not compiling with u64_backend gives a ton of errors relating to missing constants. Not compiling with std gives a ton of errors relating to Vec and other stuff imported by default in std.

Fallible multiscalar multiplication

The current multiscalar multiplication traits allow taking borrows or values of scalars and points, which means that it's possible to build up scalars using iterator combinators, and pass them directly into the MSM functions.

However, they don't allow taking Option<Point>. This means that one really cool usecase isn't possible: for batch verification, you cannot dynamically decompress points and turn them into tables. This means that all applications using the multiscalar multiplication have to first decompress all of their points into temporary buffers, then pass those buffers into the multiscalar multiplication. However, all of this computation could in principle be inlined, without requiring any allocations, passing directly from the compressed points to the lookup tables.

I'm not sure if there's a good use-case for this for constant-time code, but it seems really useful for variable-time code. Maybe the VartimeMultiscalarMul trait should have an extra function that operates on Options?

Compressed Point Formats

We should allow serialization/deserialization to different formats (e.g., Edwards Y, Montgomery X, etc).

Refactor / polish AVX2 code

The current AVX2 code is a hacky mess for two reasons: it was a proof of concept, and the API it uses is itself unstable.

It should be refactored, polished, and have the yolocrypto feature removed.

Decaf support

We would like to implement Decaf, to get a prime-order group without concerns about cofactors. Mike Hamburg kindly pointed us to his code here: https://sourceforge.net/p/ed448goldilocks/code/ci/curve25519-work/tree/

Efficient multiscalar multiplication for large inputs

Currently Dalek implements Straus algorithm that scales linearly with number of points. We can use Pippenger's algorithm to scale computation sub-linearly for vectors of more than 200 points.

Implement Pippenger in vartime
Implement Pippenger in vartime for AVX2
Implement Pippenger in consttime (maybe?)
Measure the threshold for switching from Straus to Pippenger (today's experiment produced n=190)
Measure the thresholds for switching from one window size to another, based on the input size. This may be slightly different for AVX2 and non-AVX2 backends.

Erasing secrets from memory (zero on drop)

Yesterday @burdges mentioned some ideas about how to try to erase secret data from memory, and some things about zero-on-drop behaviour in Rust. It would be good to have a reliable way to clear secret data.

If I understand correctly, just implementing the Drop trait to zero memory may not be sufficient, for two reasons:

the write may be optimized away (see also);
drop() may never be called, because Rust's memory model allows memory leaks: "memory unsafety is doing something with invalid data, a memory leak is not doing something with valid data"

Other notes that may be of interest: this morning at RWC 2017, Laurent Simon (@lmrs2 ?) presented secretgrind.

It could be quite convenient if Rust had a #[secret_stack] function annotation that guaranteed stack erasure, but this would require a language change.

1.0.0-pre code review

We need to re-review our code before releasing a 1.0 prerelease.

Point types should `impl Default`

We can't write a generic impl<T: Identity> Default for T because of the orphan rules, but we should provide Default impl for every type implementing Identity.

Fully parametrize precomputed basepoint tables

It would be nice to be able to parameterize the basepoint tables, to trade speed against size, and to allow building dalek without the 30K basepoint table (for embedded devices).

I think this would require some kind of multi-stage build process, first building the core EC parts, then rebuilding, using the bootstrap step to compute the tables. The size of the tables could then be adjusted at build time. I don't know if this approach is exactly possible or what would be involved, but it has the upside of allowing us to eliminate the 2KLOC hardcoded tables from the source.