Right now s2n uses /dev/urandom for all entropy, per the example of the excellent NaCL

The <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id

Issues around including a PRNG about s2n-tls HOT 3 CLOSED

aws commented on May 21, 2024

Issues around including a PRNG

from s2n-tls.

Comments (3)

colmmacc commented on May 21, 2024

Entropy in s2n, an update and a proposal

I've spent the week or two reading up in more detail on current developments with PRNGs, thanks to Stan from Intuit for sending me some good references. Based on everything I've read so far I'd like to start with a proposal.

Quick refresh on some of the background

s2n uses randomly-generated data in a few places:

The random data sections in the client/server hello handshake messages
Explicit IVs for CBC mode ciphers in TLS1.1+
The starting value for our nonce when using AEAD ciphers
(TBD) ECDSA signatures
Generation of ephemeral Diffie-Hellman parameters

In all but the case of Diffie Hellman parameters, the entropy data is effectively "public" and is visible directly on the wire in plaintext form. The theoretical worst case for entropy use is a cipher suite such as TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA which will require all 5 types of randomly-generated data usage. In particular, because each record requires randomly generated data (for the explicit IV), our source of entropy can quickly become the main performance bottleneck in s2n.

Can we simply use /dev/urandom or genrandom()/getentropy() ?

I've spent some time profiling both reads from /dev/urandom and the getrandom()/getentropy() system calls, I couldn't find a significant performance difference. In all cases they perform poorly when faced with many contended reads from different threads. Even if these patterns could be made lock-free within the kernel, system calls come with their own performance penalties: even the very fastest system calls seem to incur a ~10 microsecond penalty, which is quite a lot in the context of sending a TLS record.

Still getrandom()/getentropy() are better to use, where available, as they avoid fatal error cases around mis-configured chroot jails and file descriptor exhaustion.

Can we use a cache to alleviate the bottleneck?

An early version of s2n had a thread-local 4K buffer/cache of data read from /dev/urandom. This drastically reduces the contention on the kernel, but it has two other problems:

It means that performance could be quite peaky: occasionally, when the buffer runs out, an s2n operation would appear to pause - which is not ideal.
The buffer was not fork() safe. Although the BSD/Mac family have the minherit() system call which can be used to mark areas of memory that should be zeroed on fork(). But this isn't portable to Linux, so we have no robust mechanism to ensure that the same buffer isn't being used in two processes.

The pthread_atfork callback can be used, but clone() can be called directly which bypasses it. getpid() can be used as a guard: but this does not protect against PID caching, or against "double fork" problems where a grand-child process may end up with a duplicate pid. getpid() and getppid() can be used together as a guard, but this is not safe against "triple fork" problems (although a grandchild and a great grandchild each having pid collisions is unlikely, it is possible) and both introduce all of the context switch problems that come with system calls.

Proposal: move to a per-connection CSRNG streams

In s2n we have s2n_connection objects, which application authors are encouraged to keep pools of and re-use across many TCP connections/sessions. Part of the s2n API contract is that a connection may only be safely accessed from two threads at a time: a single reader and a single writer.

This makes the s2n_connection object a good place to keep some state around entropy generation. I'd like to propose adding two new pieces of state to the object:

struct s2n_connection {
    ...
    s2n_entropy_generator public;
    s2n_entropy_generator private;
};

each entropy generator would be periodically seeded from /dev/urandom, or getrandom/getentropy() as appropriate, and would be used to for any randomly generated data associated with that s2n_connection.

Seeding would occur when s2n_connection_new() is called, and when a hard-coded limit on the entropy volume is hit. To avoid performance stalls, we could choose to reseed as part of s2n_connection_wipe(), which is out of line with the protocol handling phase. So for example: we might put a hard limit of 262 bytes before reseeding MUST occur, but opportunistically perform it within s2n_connection_wipe if more than 260 bytes have been generated.

Additionally, we will use minherit() and pthreat_atfork() to zero a small piece of global state. reseeding will also be forced whnever this global state is detected to be zero.

Why public and private?

Since there is still some small risk of a very broken application fork()/clone()-ing and copying s2n_connection object state between processes, we do our best to be defensive against this by never leaking the "private" entropy we use (e.g. DH parameters) and the "public" entropy that is visible on the wire.

Which PRNG should we use?

Since the data we randomly generate must be cryptographically secure and unpredictable, even when the previous results are public, we must use a CSRNG. arc4random, yarrow, the DRBG family are all examples. Since we would like to add support for ChaCha20 as a streaming cipher, and ChaCha20 can be used as a CSRNG, I'm proposing that we use ChaCha20 here. ChaCha20 is available in the Public Domain, and BSD licensed references are available on using ChaCha20 as a CSRNG (see http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libc/gen/arc4random.c?rev=1.28&content-type=text/x-cvsweb-markup for example).

A brief summary of the algorithm for using ChaCha20 as a CSRNG is:

s is the initial 32-byte (256 bits) seed
compute the 64-byte quantity x = ChaCha20_s(0)
split x into two 32-byte quantities s' and k
replace s by s', and
use k as output

ChaCha20 is presently in both LibreSSL and BoringSSL, though not yet mainline OpenSSL 1.0.2 (but it should be shortly).

Trade-offs

The downside of this is more memory and state usage. ChaCha20 requires about 128 bytes of state, so we would be adding around 256 bytes of state to each connection object.

Using our generators

"Public" data can be used directly, even with ECDSA signatures, and no special tricks are required. For the private data, we need to over-ride libcrypto's random method. This is supported by both LibreSSL and OpenSSL, but not BoringSSL (who have removed the ability to over-ride entirely). Thus BoringSSL would use its own, slower, buffer-based system for the private random data. Since there is no risk of overlap between the streams in any cases, this seems acceptable.

from s2n-tls.

colmmacc commented on May 21, 2024

Update:

I've got an experimental local branch part way to implementing the above proposal, but in the process I've realised that it's not possible to use different RNGs within libcrypto on a per-connection basis without being non-reentrant. It is possible to set a new default ENGINE for handling RAND, and internally it boils down to a small number of pointer manipulations so it could be speedy, but if we were interrupted by a signal the random method would left set pointing at the per-connection method.

Because of that, pragmatically the best thing may be to use a single, thread-local generator for the private libcrypto generated-data (which is DH parameters).

from s2n-tls.

colmmacc commented on May 21, 2024

The #61 PR addresses this issue

from s2n-tls.

Issues around including a PRNG about s2n-tls HOT 3 CLOSED

Comments (3)

Entropy in s2n, an update and a proposal

Quick refresh on some of the background

Can we simply use /dev/urandom or genrandom()/getentropy() ?

Can we use a cache to alleviate the bottleneck?

Proposal: move to a per-connection CSRNG streams

Why public and private?

Which PRNG should we use?

Trade-offs

Using our generators

Update:

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent