Comments (16)
I started to implement this, but in the process I reconsidered it and now agree with the @nstilt1 comment above.
The backends live only on stack and no different from any other data spilled on stack. As we discussed in the traits issue, we do not provide any guarantees for spilled data (though we try to minimize spillage amount if possible) and zeroization (especially with its current implementation in zeroize
) can negatively impact performance not only by doing unnecessary writes, but also by inhibiting optimizations.
So I think we can close this issue as "not planned".
from stream-ciphers.
I've begun working on the second option... however... with the 1.60 MSRV, Rudra
might not work on it since it is on nightly 1.58. I'll probably add the zeroize functionality last.
from stream-ciphers.
We're about to start making breaking changes to all of the crates in this repo. I think it's fine for your PRs to assume that and an MSRV bump.
from stream-ciphers.
I've made some progress with the second option, but even before adding zeroize
, having a union
for the x86
backends seems to cause avx2
performance to decrease by about 15%. I've got 4 branches where I tried this, and only 1 branch operates at less than 1.0 cpb, but it looks nasty and I don't endorse that branch. Also, I've adjusted the newer branches somewhat so that ChaChaCore<R, V>
lives in autodetect.rs
, so no need to worry about that aspect.
What the branch that has decent performance does:
- the
Backend
struct stores results and ablock
index so that it never has to regenerate results for the sameblock_pos
. With the way that theavx2
backend works, it is very difficult to make the extraction pretty. There might be a way to make it prettier... but is it worth it given the drawbacks (see below)
Some drawbacks of the backend_union
branches:
- they require some backend methods to manage the state (ie
get/set_block_pos()
,get/set_nonce()
) - they currently fail some avx2
rng.rs
andcipher
tests, but the only tests they fail are the ones that involveget_block_pos()
, and this is probably because of how the state increments and how it is managed. - They are generally slower than my
pointers
branch, and they're all based on thepointers
branch so that I don't need to copy/paste a bunch of code... and this is still without adding inzeroize
. I have no idea whybackend_union_2,3,4
are 15+% slower.
If you have any ideas for improving backend_union
or backend_union_2
, I'm open to suggestions. backend_union_3
and 4
just tried to remove Backend::results
and pass a temporary variable into rounds
, but the performance did not change. Otherwise, I've got a proposal:
Proposal
Mayhaps we could use pointers
, and it might be better to have an unsafe write_ks_blocks(&mut self, dest_ptr: *mut u8, num_blocks: usize, results_buffer: &mut Self::Results)
. Another improvement with the functionality of write_ks_blocks()
would be if was capable of generating more than 4 blocks using a while loop.
With a results_buffer
parameter, the backends could reuse the same results_buffer
and call .zeroize()
when the methods are finished with it. In each SIMD backend, gen_ks_block
and gen_par_ks_blocks` both currently fill the same type of buffer.
cipher
might benefit from taking advantage of this and changing a little. Also, I've taken a peek at inout_buf
, and there's a slight chance that... just maybe... inner
could pass a null_ptr
to write_ks_blocks()
, and when write_ks_blocks()
receives a null_ptr
, it could simply overwrite results_buffer
instead of copying it to the pointer. Then cipher could use the pointer to results_buffer
to xor it with the data. I don't know if inout_buf
would work with this... but it would be kinda cool if it did.
(EDIT): darn... I forgot that the avx2 implementation doesn't store the blocks sequentially. So that would not work.
The rng_inner()
method could look something like this:
pub(crate) unsafe fn rng_inner<R>(state: &mut [u32; STATE_WORDS], mut dest_ptr: *mut u8, num_blocks: usize)
where
R: Rounds,
{
let mut backend = Backend::<R>::new(state);
// replace with some generic buffer initialization?
let mut results: [[__m256i; 4]; N] = [[_mm256_setzero_si256(); 4]; N];
backend.write_ks_blocks(dest_ptr, num_blocks, &mut results);
// replace this with a generic method?
state[12] = _mm256_extract_epi32(backend.ctr[0], 0) as u32;
#[cfg(feature = "zeroize")]
{
backend.zeroize();
results.zeroize();
}
}
I feel like this could be okay. I could go ahead and bench this with zeroize
, but I have a feeling the performance won't be as bad as backend_union_2
/3/4. I'll also see about updating /benches
for those branches if you want to be able to compare them quantitatively.
from stream-ciphers.
I benched a new branch (zeroize_simd
) that essentially goes with the first option, just zeroizing after generation, and the fill_bytes()
performance for avx2 ranged from 1.01 to 0.99 cpb, which beats the 3 neater backend_union_X
branches. It does not beat the nasty backend_union
branch. That method might look a little better with recursion. Will see what I can do.
from stream-ciphers.
Had a bunch of benchmarks here, but TL; DR: option 2 is more desirable now that it is working
from stream-ciphers.
Alright. Sorry for my complaining. I just didn't like that the first attempt at Option 2 resulted in Cipher's 1.6 cpb performance, even though it used pretty much the exact same code as before.
I've been working on backend_union_update_state
a little more. I was able to fix the RNG's get_word_pos()
issues, and now the Cipher is still failing seek tests. Will hopefully have fixed it by tomorrow
from stream-ciphers.
I wrote about it previously in different issues, but I think that in the case of "flat" types (i.e. types which do not reference "outside" memory) we can use the following implementation:
impl Drop for Foo {
fn drop(&mut self) {
let n = core::mem::size_of::<Self>();
unsafe {
core::ptr::write_bytes(self, 0, n);
// blackbox `asm!`
}
}
}
from stream-ciphers.
Would that be suitable for a ChaChaCore
struct that contains a union
? I've added the ZeroizeOnDrop
code, but judging by the looks of your suggested implementation... it would be a lot less code than having to determine which part of the union is being used.
And what of the ManuallyDrop
s in the union
? Is it necessary to call ManuallyDrop::drop()
on the union field that is in use? Or maybe even just calling it on any field since they should occupy the same memory?
from stream-ciphers.
ManuallyDrop
s are required by current implementation of union
. IIRC we do not have any actual Drop
s on variants.
from stream-ciphers.
Your ZeroizeOnDrop implementation seems to be far superior to a regular implementation of ZeroizeOnDrop. I've gone ahead and cleaned up my working branch a little and got it to pass some tests, but I'm not sure if I would be able to add that ZeroizeOnDrop impl on my own. I could either try to merge that branch with #333 or I could make a separate PR
from stream-ciphers.
Even though I made code for this issue, it seems that it would be a wasted effort to rework the backends given that typical constructor methods result in a stack-allocated structure, such as any constructor that ends with:
Self {
...
}
With constructors returning that and trying to run let mut private_struct = Box::new(SomeCryptoStruct::new(...))
would likely result in a stack-allocated structure being copied/moved onto the heap, rather than allocating it on the heap... meaning it might be pointless to make ChaChaCore own its temporary buffers in an attempt to be OCD about zeroizing data.
There is a way to ensure that all allocated data stays on the heap using a type and macro such as
#[cfg(feature = "alloc")]
type CfgBoxed<T> = Box<T>;
#[cfg(not(feature = "alloc"))]
type CfgBoxed<T> = T;
/// Defines a new instance of a data structure that is conditionally on the heap, based on whether the `alloc` feature is enabled.
#[macro_export]
#[cfg(feature = "alloc")]
macro_rules! cfg_new_boxed {
($data:expr) => {
$crate::Box::new($data)
};
}
/// Defines a new instance of a data structure that is conditionally on the heap, based on whether the `alloc` feature is enabled.
#[macro_export]
#[cfg(not(feature = "alloc"))]
macro_rules! cfg_new_boxed {
($data:expr) => {
$data
};
}
// and then use it in a constructor like so
pub struct Test {
a: u64,
b: [u32; 16]
}
impl Test {
pub fn new(value: &u64) -> CfgBoxed<Self> {
let mut result = cfg_new_boxed!(Self { a: 0, b: [0u32; 16] } );
result.a = *value;
result
}
}
While this could work, every crypto crate would "need" to implement these types of constructor methods... but it would kind of be a waste of time because
a) literally every crypto crate would "need" to do something like this and
b) a better solution would be proper stack bleaching. Performance-wise, and the result would be better. It would be especially better if it was supported natively with the LLVM and Rust, like that old RFC suggests.
Part of the reason I wanted to consider this route is because the eraser
crate, as I understand it, runs functions on the heap. This route would probably be a little better than running functions on the heap, aside from the sheer number of crates that would "need" to be modified.
I'm fine if we close this issue—not all code is meant to make it to production. But if y'all somehow would still like code from the old branch I can see about working it into a new branch based on the current repo.
from stream-ciphers.
The most straightforward way to impl it would be to add zeroize
-gated Drop
impls on any relevant structs in chacha20::backends::*
which take care of clearing out the intermediate state.
That wouldn't wipe all of the state that's left over on the stack, but that's not something we generally do for any of our cryptographic implementations.
a better solution would be proper stack bleaching
Yep
from stream-ciphers.
As I wrote above, I think a better solution will be to use the zeroize_flat_type
function. Unfortunately, it was released in v1.8.0 which got yanked because of unrelated changes. Maybe we should release v1.7.1 with it?
from stream-ciphers.
Yeah, I've been meaning to redo the zeroize
release with an optional simd
feature which avoids the MSRV jump. Hopefully this weekend.
from stream-ciphers.
zeroize
v1.8.1 is out with zeroize_flat_type
: https://docs.rs/zeroize/1.8.1/zeroize/fn.zeroize_flat_type.html
from stream-ciphers.
Related Issues (20)
- released cfb-mode depends on [email protected] HOT 3
- Please publish aes-ctr v0.5 to crates.io HOT 2
- request for rabbit cipher HOT 1
- ctr: block size vs counter size naming HOT 3
- XChaCha20 unnecessarily limits keystream to 256gb HOT 6
- Missing algorithms HOT 4
- Remove circular dependencies on `aes` crate HOT 1
- why chacha20 seek only works up to 2^37 while xchacha20 seek works up to 2^62? HOT 1
- out of range for slice of length 16 HOT 5
- chacha20: Add wide (4-block) AVX2 impl
- Port chacha20 SIMD backends
- Locking `zeroize` to `<=1.5` for `chacha` prevents compiling with `num-bigint-dig v0.8.1` (for `rsa 0.6.0-pre`) HOT 3
- version 0.9.0 does not compile on Arduino architecture HOT 1
- chacha20: SIGSEGV in CI HOT 8
- Make the neon feature available for ChaCha20 HOT 3
- Does it allow to call decryption method explicitly? HOT 1
- Add more test vectors to salsa20 ? HOT 8
- chacha20: 64-bit counter support HOT 3
- No performance improvements with `-Ctarget-feature=+neon` on `arm64` HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stream-ciphers.