sipa / minisketch Goto Github PK

Minisketch: an optimized library for BCH-based set reconciliation

License: MIT License

C 5.08% Makefile 1.29% C++ 82.87% Shell 0.25% M4 6.32% Python 4.05% Dockerfile 0.14%

minisketch's Introduction

Minisketch: a library for BCH-based set reconciliation

libminisketch is an optimized standalone MIT-licensed library with C API for constructing and decoding set sketches, which can be used for compact set reconciliation and other applications. It is an implementation of the PinSketch^[1] algorithm. An explanation of the algorithm can be found here.

Sketches for set reconciliation

Sketches, as produced by this library, can be seen as "set checksums" with two peculiar properties:

Sketches have a predetermined capacity, and when the number of elements in the set is not higher than the capacity, libminisketch will always recover the entire set from the sketch. A sketch of b-bit elements with capacity c can be stored in bc bits.
The sketches of two sets can be combined by adding them (XOR) to obtain a sketch of the symmetric difference between the two sets (i.e., all elements that occur in one but not both input sets).

This makes them appropriate for a very bandwidth-efficient set reconciliation protocol. If Alice and Bob each have a set of elements, and they suspect that the sets largely but not entirely overlap, they can use the following protocol to let both parties learn all the elements:

Alice and Bob both compute a sketch of their set elements.
Alice sends her sketch to Bob.
Bob combines the two sketches, and obtains a sketch of the symmetric difference.
Bob tries to recover the elements from the difference sketch.
Bob sends every element in the difference that he has to Alice.

This will always succeed when the size of the difference (elements that Alice has but Bob doesn't plus elements that Bob has but Alice doesn't) does not exceed the capacity of the sketch that Alice sent. The interesting part is that this works regardless of the actual set sizes—only the difference matters.

If the elements are large, it may be preferable to compute the sketches over hashes of the set elements. In that case an additional step is added to the protocol, where Bob also sends the hash of every element he does not have to Alice, who responds with the requested elements.

The doc/ directory has additional tips for designing reconciliation protocols using libminisketch.

Evaluation

The first graph above shows a benchmark of libminisketch against three other set reconciliation algorithms/implementations. The benchmarks were performed using a single core on a system with an Intel Core i7-7820HQ CPU with clock speed locked at 2.4 GHz. The diagram shows the time needed for merging of two sketches and decoding the result. The creation of a sketch on the same machine takes around 5 ns per capacity and per set element. The other implementations are:

pinsketch, the original PinSketch implementation.
cpisync, a software project which implements a number of set reconciliation algorithms and protocols. The included benchmark analyzes the non-probabilistic version of the original CPISync algorithm^[5] only.
A high-performance custom IBLT implementation using 4 hash functions and 32-bit checksums.

For the largest sizes currently of interest to the authors, such as a set of capacity 4096 with 1024 differences, libminisketch is forty-nine times faster than pinsketch and over eight thousand times faster than cpisync. libminisketch is fast enough on realistic set sizes for use on high-traffic network servers where computational resources are limited.

Even where performance is latency-limited, small minisketches can be fast enough to improve performance. On the above i7-7820HQ, a set of 2500 30-bit entries with a difference of 20 elements can be communicated in less time with a minisketch than sending the raw set so long as the communications bandwidth is 1 gigabit per second or less; an eight-element difference can be communicated in better than one-fifth the time on a gigabit link.

The second graph above shows the performance of the same algorithms on the same system, but this time keeping the capacity constant at 128, while varying the number of differences to reconcile between 1 and 128. It shows how cpisync's reconciliation speed is mostly dependent on capacity, while pinsketch/libminisketch are more dependent on number of differences.

The third graph above shows the size overhead of a typical IBLT scheme over the other algorithms (which are near-optimal bandwidth), for various levels of failure probability. IBLT takes tens of times the bandwidth of libminisketch sketches when the set difference size is small and the required failure rate is low.

The fourth graph above shows the effect of the field size on speed in libminisketch. The three lines correspond to:

CLMUL 64-bit: Intel Core i7-7820HQ system at 2.4 GHz
Generic 64-bit: POWER9 CP9M06 system at 2.8 GHz (Talos II)
Generic 32-bit: Cortex-A53 at 1.2 GHz (Raspberry Pi 3B)

It shows how CLMUL implementations are faster for certain fields (specifically, field sizes for which an irreducible polynomial of the form x^b + x + 1 over GF(2) exists, and to a lesser extent, fields which are a multiple of 8 bits). It also shows how (for now) a significant performance drop exists for fields larger than 32 bits on 32-bit platforms. Note that the three lines are not at the same scale (the Raspberry Pi 3B is around 10x slower for 32-bit fields than the Core i7; the POWER9 is around 1.3x slower).

Below we compare the PinSketch algorithm (which libminisketch is an implementation of) with other set reconciliation algorithms:

Algorithm	Sketch size	Decode success	Decoding complexity	Difference type	Secure sketch
CPISync^[2]	(b+1)c	Always	O(n³)	Both	Yes
PinSketch^[1]	bc	Always	O(n²)	Symmetric only	Yes
IBLT^[6][7]	αbc (see graph 3)	Probabilistic	O(n)	Depends	No

Sketch size: This column shows the size in bits of a sketch designed for reconciling c different b-bit elements. PinSketch and CPISync have a near-optimal^[11] communication overhead, which in practice means the sketch size is very close (or equal to) bc bits. That is the same size as would be needed to transfer the elements of the difference naively (which is remarkable, as the difference isn't even known by the sender). For IBLT there is an overhead factor α, which depends on various design parameters, but is often between 2 and 10.
Decode success: Whenever a sketch is designed with a capacity not lower than the actual difference size, CPISync and PinSketch guarantee that decoding of the difference will always succeed. IBLT always has a chance of failure, though that chance can be made arbitrarily small by increasing the communication overhead.
Decoding complexity: The space savings achieved by near-optimal algorithms come at a cost in performance, as their asymptotic decode complexity is quadratic or cubic, while IBLT is linear. This means that using near-optimal algorithms can be too expensive for applications where the difference is sufficiently large.
Difference type: PinSketch can only compute the symmetric difference from a merged sketch, while CPISync and IBLT can distinguish which side certain elements were missing on. When the decoder has access to one of the sets, this generally doesn't matter, as he can look up each of the elements in the symmetric difference with one of the sets.
Secure sketch: Whether the sketch satisfies the definition of a secure sketch^[1], which implies a minimal amount about a set can be extracted from a sketch by anyone who does not know most of the elements already. This makes the algorithm appropriate for applications like fingerprint authentication.

Building

The build system is very rudimentary for now, and improvements are welcome.

The following may work and produce a libminisketch.a file you can link against:

git clone https://github.com/sipa/minisketch
cd minisketch
./autogen.sh && ./configure && make

Usage

In this section Alice and Bob are trying to find the difference between their sets. Alice has the set [3000 ... 3009], while Bob has [3002 ... 3011].

First, Alice creates a sketch:

#include <stdio.h>
#include <assert.h>
#include "../include/minisketch.h"
int main(void) {

  minisketch *sketch_a = minisketch_create(12, 0, 4);

The arguments are:

The field size b, which specifies the size of the elements being reconciled. With a field size b, the supported range of set elements is the integers from 1 to 2^b - 1, inclusive. Note that elements cannot be zero.
The implementation number. Implementation 0 is always supported, but more efficient algorithms may be available on some hardware. The serialized form of a sketch is independent of the implementation, so different implementations can interoperate.
The capacity c, which specifies how many differences the resulting sketch can reconcile.

Then Alice adds her elements to her sketch. Note that adding the same element a second time removes it again, as sketches have set semantics, not multiset semantics.

  for (int i = 3000; i < 3010; ++i) {
    minisketch_add_uint64(sketch_a, i);
  }

The next step is serializing the sketch into a byte array:

  size_t sersize = minisketch_serialized_size(sketch_a);
  assert(sersize == 12 * 4 / 8); // 4 12-bit values is 6 bytes.
  unsigned char *buffer_a = malloc(sersize);
  minisketch_serialize(sketch_a, buffer_a);
  minisketch_destroy(sketch_a);

The contents of the buffer can then be submitted to Bob, who can create his own sketch:

  minisketch *sketch_b = minisketch_create(12, 0, 4); // Bob's own sketch
  for (int i = 3002; i < 3012; ++i) {
    minisketch_add_uint64(sketch_b, i);
  }

After Bob receives Alice's serialized sketch, he can reconcile:

  sketch_a = minisketch_create(12, 0, 4);     // Alice's sketch
  minisketch_deserialize(sketch_a, buffer_a); // Load Alice's sketch
  free(buffer_a);

  // Merge the elements from sketch_a into sketch_b. The result is a sketch_b
  // which contains all elements that occurred in Alice's or Bob's sets, but not
  // in both.
  minisketch_merge(sketch_b, sketch_a);

  uint64_t differences[4];
  ssize_t num_differences = minisketch_decode(sketch_b, 4, differences);
  minisketch_destroy(sketch_a);
  minisketch_destroy(sketch_b);
  if (num_differences < 0) {
    printf("More than 4 differences!\n");
  } else {
    ssize_t i;
    for (i = 0; i < num_differences; ++i) {
      printf("%u is in only one of the two sets\n", (unsigned)differences[i]);
    }
  }
}

In this example Bob would see output such as:

$ gcc -std=c99 -Wall -Wextra -o example ./doc/example.c -Lsrc/ -lminisketch -lstdc++ && ./example
3000 is in only one of the two sets
3011 is in only one of the two sets
3001 is in only one of the two sets
3010 is in only one of the two sets

The order of the output is arbitrary and will differ on different runs of minisketch_decode().

Applications

Communications efficient set reconciliation has been proposed to optimize Bitcoin transaction distribution^[8], which would allow Bitcoin nodes to have many more peers while reducing bandwidth usage. It could also be used for Bitcoin block distribution^[9], particularly for very low bandwidth links such as satellite. A similar approach (CPISync) is used by PGP SKS keyservers to synchronize their databases efficiently. Secure sketches can also be used as helper data to reliably extract a consistent cryptographic key from fuzzy biometric data while leaking minimal information^[1]. They can be combined with dcnets to create cryptographic multiparty anonymous communication^[10].

Implementation notes

libminisketch is written in C++11, but has a C API for compatibility reasons.

Specific algorithms and optimizations used:

Finite field implementations:
- A generic implementation using C unsigned integer bit operations, and one using the CLMUL instruction where available. The latter has specializations for different classes of fields that permit optimizations (those with trinomial irreducible polynomials, and those whose size is a multiple of 8 bits).
- Precomputed tables for (repeated) squaring, and for solving equations of the form x² + x = a^[2].
- Inverses are computed using an exponentiation ladder^[12] on systems where multiplications are relatively fast, and using an extended GCD algorithm otherwise.
- Repeated multiplications are accelerated using runtime precomputations on systems where multiplications are relatively slow.
- The serialization of field elements always represents them as bits that are coefficients of the lowest-weight (using lexicographic order as tie breaker) irreducible polynomials over GF(2) (see this list), but for some implementations they are converted to a different representation internally.
The sketch algorithms are specialized for each separate field implementation, permitting inlining and specific optimizations while avoiding dynamic allocations and branching costs.
Decoding of sketches uses the Berlekamp-Massey algorithm^[3] to compute the characteristic polynomial.
Finding the roots of polynomials is done using the Berlekamp trace algorithm with explicit formula for quadratic polynomials^[4]. The root finding is randomized to prevent adversarial inputs that intentionally trigger worst-case decode time.
A (possibly) novel optimization combines a test for unique roots with the Berlekamp trace algorithm.

Some improvements that are still TODO:

Explicit formulas for the roots of polynomials of higher degree than 2
Subquadratic multiplication and modulus algorithms
The Half-GCD algorithm for faster GCDs
An interface for incremental decoding: most of the computation in most failed decodes can be reused when attempting to decode a longer sketch of the same set
Platform specific optimizations for platforms other than x86
Avoid using slow uint64_t for calculations on 32-bit hosts
Optional IBLT / Hybrid and set entropy coder under the same interface

References

[1] Dodis, Ostrovsky, Reyzin and Smith. Fuzzy Extractors: How to Generate Strong Keys from Biometrics and Other Noisy Data. SIAM Journal on Computing, volume 38, number 1, pages 97-139, 2008). [URL] [PDF]
[5] A. Trachtenberg, D. Starobinski and S. Agarwal. Fast PDA synchronization using characteristic polynomial interpolation. Proceedings, Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies, New York, NY, USA, 2002, pp. 1510-1519 vol.3. [PDF]
[2] Cherly, Jørgen, Luis Gallardo, Leonid Vaserstein, and Ethel Wheland. Solving quadratic equations over polynomial rings of characteristic two. Publicacions Matemàtiques (1998): 131-142. [PDF]
[3] J. Massey. Shift-register synthesis and BCH decoding. IEEE Transactions on Information Theory, vol. 15, no. 1, pp. 122-127, January 1969. [PDF]
[4] Bhaskar Biswas, Vincent Herbert. Efficient Root Finding of Polynomials over Fields of Characteristic 2. 2009. hal-00626997. [URL] [PDF]
[6] Eppstein, David, Michael T. Goodrich, Frank Uyeda, and George Varghese. What's the difference?: efficient set reconciliation without prior context. ACM SIGCOMM Computer Communication Review, vol. 41, no. 4, pp. 218-229. ACM, 2011. [PDF]
[7] Goodrich, Michael T. and Michael Mitzenmacher. Invertible bloom lookup tables. 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton) (2011): 792-799. [PDF]
[8] Maxwell, Gregory F. Blocksonly mode BW savings, the limits of efficient block xfer, and better relay Bitcointalk 2016, Technical notes on mempool synchronizing relay #bitcoin-wizards 2016.
[9] Maxwell, Gregory F. Block network coding Bitcoin Wiki 2014, Technical notes on efficient block xfer #bitcoin-wizards 2015.
[10] Ruffing, Tim, Moreno-Sanchez, Pedro, Aniket, Kate, P2P Mixing and Unlinkable Bitcoin Transactions NDSS Symposium 2017 [URL] [PDF]
[11] Y. Misky, A. Trachtenberg, R. Zippel. Set Reconciliation with Nearly Optimal Communication Complexity. Cornell University, 2000. [URL] [PDF]
[12] Itoh, Toshiya, and Shigeo Tsujii. "A fast algorithm for computing multiplicative inverses in GF (2m) using normal bases." Information and computation 78, no. 3 (1988): 171-177. [URL]

minisketch's People

Contributors

Stargazers

Watchers

minisketch's Issues

Precomputed tables instead of C++-compiler generated?

The linear mapping tables for squaring/squarerooting/solving x^2+x+a/converting between field representations are currently all encoded very compactly in the C++ code, and get expanded at compile time (through constexpr) into full lookup tables.

Given that the compact encoding (mapping of every power-of-two) is already generated using Sage code, and we have little hope of avoiding that, I wonder if we shouldn't drop the whole constexpr construction, and instead generate the full expanded tables directly in Sage. That would massively increase the source code size, but probably cut down compilation times too (and perhaps enable a few small optimizations too).

One reason to not do this is if we'd want to make the layout of the tables dynamic (for example, we realize that on certain hardware constructing the table using 6-bit lookups is sufficiently suboptimal that we want to use a different layout there). If the tables are stored in expanded form, that would require duplicating the tables.

Errors when building with MSVC

There is a minisketch_tests.cpp test in the Bitcoin Core project. Being built with MSVC, it fails in different scenarios:

A similar issue has been observed when using experimental CMake-based build system as well.

Another simple way to reproduce the issue is to build libminisketch without clmul instructions support, see this branch:

C:\Users\hebasto\bitcoin>build_msvc\x64\Release\test_bitcoin.exe -t minisketch_tests
Running 1 test case...
C:/Users/hebasto/bitcoin/src/test/minisketch_tests.cpp(43): fatal error: in "minisketch_tests/minisketch_test": critical check dec.has_value() has failed

*** 1 failure is detected in the test module "Bitcoin Core Test Suite"

C:\Users\hebasto\bitcoin>msbuild --version
Microsoft (R) Build Engine version 17.2.1+52cd2da31 for .NET Framework
Copyright (C) Microsoft Corporation. All rights reserved.

17.2.1.25201

UPDATE: also see #67 (comment)

`-Wnonnull` warnings when configured with `--enable-fields={11,23}`

A build system is Ubuntu 22.04:

$ g++ --version
g++ (Ubuntu 11.2.0-19ubuntu1) 11.2.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

An excerpt from the build log:

$ ./configure --enable-fields=11 && make clean && make
...
  enable clmul fields     = yes
...
src/fields/clmul_common_impl.h:113:71: warning: ‘this’ pointer is null [-Wnonnull]
  113 |     static inline constexpr I Sqr8(I a) { return SQR8->template Map<O>(a); }
      |                                                  ~~~~~~~~~~~~~~~~~~~~~^~~
...
src/fields/clmul_common_impl.h:114:73: warning: ‘this’ pointer is null [-Wnonnull]
  114 |     static inline constexpr I Sqr16(I a) { return SQR16->template Map<O>(a); }
      |                                                   ~~~~~~~~~~~~~~~~~~~~~~^~~
...

Performance figures / graph for minisketch with subdivision?

I've seen a number of comments that incorrectly dismiss this work as useless 'because quadratic', which is obviously non-sense on it's face but also because our recommendation of using splitting makes the overall time linear -- for applications that don't need the information theoretic properties. We should perhaps add a minisketch with subdivision to our graphs.

warning: 'this' pointer is null in fields/clmul_common_impl.h

Building 4c1d32b with GCC 11 (gcc-11 (Ubuntu 11.1.0-1ubuntu1~21.04) 11.1.0):

In file included from src/fields/clmul_4bytes.cpp:11:
src/fields/clmul_common_impl.h: In instantiation of 'static constexpr I {anonymous}::GenField<I, B, MOD, MUL, F, SQR, SQR2, SQR4, SQR8, SQR16, QRT, T, LOAD, SAVE>::Sqr16(I) [with I = unsigned int; int B = 25; I MOD = 9; I (* MUL)(I, I) = {anonymous}::MulWithClMulReduce<unsigned int, 25, 9>; F = RecLinTrans<unsigned int, 5, 5, 5, 5, 5>; const F* SQR = (& {anonymous}::SQR_TABLE_25); const F* SQR2 = (& {anonymous}::SQR2_TABLE_25); const F* SQR4 = (& {anonymous}::SQR4_TABLE_25); const F* SQR8 = (& {anonymous}::SQR8_TABLE_25); const F* SQR16 = 0; const F* QRT = (& {anonymous}::QRT_TABLE_25); T = IdTrans; const T* LOAD = (& ID_TRANS); const T* SAVE = (& ID_TRANS)]':
src/fields/clmul_common_impl.h:140:46:   required from '{anonymous}::GenField<I, B, MOD, MUL, F, SQR, SQR2, SQR4, SQR8, SQR16, QRT, T, LOAD, SAVE>::Elem {anonymous}::GenField<I, B, MOD, MUL, F, SQR, SQR2, SQR4, SQR8, SQR16, QRT, T, LOAD, SAVE>::Inv({anonymous}::GenField<I, B, MOD, MUL, F, SQR, SQR2, SQR4, SQR8, SQR16, QRT, T, LOAD, SAVE>::Elem) const [with I = unsigned int; int B = 25; I MOD = 9; I (* MUL)(I, I) = {anonymous}::MulWithClMulReduce<unsigned int, 25, 9>; F = RecLinTrans<unsigned int, 5, 5, 5, 5, 5>; const F* SQR = (& {anonymous}::SQR_TABLE_25); const F* SQR2 = (& {anonymous}::SQR2_TABLE_25); const F* SQR4 = (& {anonymous}::SQR4_TABLE_25); const F* SQR8 = (& {anonymous}::SQR8_TABLE_25); const F* SQR16 = 0; const F* QRT = (& {anonymous}::QRT_TABLE_25); T = IdTrans; const T* LOAD = (& ID_TRANS); const T* SAVE = (& ID_TRANS); {anonymous}::GenField<I, B, MOD, MUL, F, SQR, SQR2, SQR4, SQR8, SQR16, QRT, T, LOAD, SAVE>::Elem = unsigned int]'
src/fields/../sketch_impl.h:302:34:   required from 'std::vector<typename F::Elem> BerlekampMassey(const std::vector<typename F::Elem>&, size_t, const F&) [with F = {anonymous}::GenField<unsigned int, 25, 9, {anonymous}::MulWithClMulReduce<unsigned int, 25, 9>, RecLinTrans<unsigned int, 5, 5, 5, 5, 5>, (& {anonymous}::SQR_TABLE_25), (& {anonymous}::SQR2_TABLE_25), (& {anonymous}::SQR4_TABLE_25), (& {anonymous}::SQR8_TABLE_25), 0, (& {anonymous}::QRT_TABLE_25), IdTrans, (& ID_TRANS), (& ID_TRANS)>; typename F::Elem = unsigned int; size_t = long unsigned int]'
src/fields/../sketch_impl.h:397:36:   required from 'int SketchImpl<F>::Decode(int, uint64_t*) const [with F = {anonymous}::GenField<unsigned int, 25, 9, {anonymous}::MulWithClMulReduce<unsigned int, 25, 9>, RecLinTrans<unsigned int, 5, 5, 5, 5, 5>, (& {anonymous}::SQR_TABLE_25), (& {anonymous}::SQR2_TABLE_25), (& {anonymous}::SQR4_TABLE_25), (& {anonymous}::SQR8_TABLE_25), 0, (& {anonymous}::QRT_TABLE_25), IdTrans, (& ID_TRANS), (& ID_TRANS)>; uint64_t = long unsigned int]'
src/fields/../sketch_impl.h:394:9:   required from here
src/fields/clmul_common_impl.h:114:73: warning: 'this' pointer is null [-Wnonnull]
  114 |     static inline constexpr I Sqr16(I a) { return SQR16->template Map<O>(a); }
      |                                                   ~~~~~~~~~~~~~~~~~~~~~~^~~
In file included from src/fields/clmul_common_impl.h:14,
                 from src/fields/clmul_4bytes.cpp:11:
src/fields/../lintrans.h:134:24: note: in a call to non-static member function 'constexpr I RecLinTrans<I, N, X ...>::Map(I) const [with O = BitsInt<unsigned int, 25>; int P = 0; I = unsigned int; int N = 5; int ...X = {5, 5, 5, 5}]'
  134 |     inline I constexpr Map(I a) const { return trans.template Map<O, P>(a) ^ rec.template Map<O, P + N>(a); }

Full output.

I haven't looked into this, but noting it here as if this is going to be integrated into Cores build system, this will likely need to be fixed / suppressed etc.

minisketch.h uses deprecated C feature

-Wall -Werror gives:

external/minisketch/include/minisketch.h:27:1: error: function declaration isn’t a prototype [-Werror=strict-prototypes]
 uint32_t minisketch_implementation_max();
 ^~~~~~~~

For C code, this needs to be '(void)'. I think that's valid C++, but if not, it needs to go in #ifndef __cplusplus.

Add known response tests for each field size

There should be one (or better, more) fixed inputs with known results that get good coverage (e.g. contain a failed split, a quad-root, etc.) for each supported size.

Probably with the input being a serialized sketch and a list of its roots. The test should decode, check the roots, reencode/reserialize and compare the serialized sketch.

There should also be at least one failure to decode case for each size that gets good coverage, several with ones selected to hit each of the main failure conditions.

My current strategy: I have instrumented the code and am running both random and (small) exhaustive tests to collect many examples that have distinct "execution fingerprints" based on the shape of the execution trace. For larger fields there are too many to use all the distinct fingerprints, so I will solve a minimal-cover problem to select a subset. I have data for 2, 3 bit fields, but collection is slow since I'm still burning most of my cpu on iblt data collection.

[This wasn't interesting before we were done with the bitstream, opening this now to track it...]

Strict c++11 std::initializer_list constexpr error

For some crazy reason, std::initializer_list::begin() is not constexpr in c++11. Though libstdc++ doesn't care, libc++ does. Building as c++14 works fine.

Before I try to learn the code and hack something together, maybe @sipa sees an easy fix?

fields/generic_1byte.cpp:22:22: error: constexpr variable 'SQR_TABLE_2' must be initialized by a constant expression
constexpr StatTable2 SQR_TABLE_2{0x1, 0x3};
^~~~~~~~~~~~~~~~~~~~~
fields/../lintrans.h:115:72: note: non-constexpr function 'begin' cannot be used in a constant expression
constexpr RecLinTrans(std::initializer_list init) : RecLinTrans(init.begin(), Num()) {}
^
fields/generic_1byte.cpp:22:22: note: in call to 'RecLinTrans({&{1, 3}[0], 2})'
constexpr StatTable2 SQR_TABLE_2{0x1, 0x3};
^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/initializer_list:89:16: note: declared here
const _Ep* begin() const _NOEXCEPT {return _begin;}
^
fields/generic_1byte.cpp:23:22: error: constexpr variable 'QRT_TABLE_2' must be initialized by a constant expression
constexpr StatTable2 QRT_TABLE_2{0x2, 0};
...
...

Annotate implicit fallthroughs

Now that we've got -Wimplicit-fallthrough turned on in the Core repo, we'll see this output during builds including minisketch (I've put together a branch using the new build system changes, to test bitcoin/bitcoin#21859, here: https://github.com/fanquake/bitcoin/tree/minisketch_integration).

We turned this warning on in bitcoin/bitcoin#21430, and for at least leveldb, made use of it's FALLTHROUGH_INTENDED macro. See details here: https://github.com/bitcoin-core/leveldb/blob/f8ae182c1e5176d12e816fb2217ae33a5472fdd7/util/hash.cc#L11.

  CXX      minisketch/src/libminisketch_a-minisketch.o
minisketch/src/minisketch.cpp:104:13: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
            case 2:
            ^
minisketch/src/minisketch.cpp:104:13: note: insert '[[fallthrough]];' to silence this warning
            case 2:
            ^
            [[fallthrough]]; 
minisketch/src/minisketch.cpp:104:13: note: insert 'break;' to avoid fall-through
            case 2:
            ^
            break; 
minisketch/src/minisketch.cpp:107:13: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
            case 3:
            ^
minisketch/src/minisketch.cpp:107:13: note: insert '[[fallthrough]];' to silence this warning
            case 3:
            ^
            [[fallthrough]]; 
minisketch/src/minisketch.cpp:107:13: note: insert 'break;' to avoid fall-through
            case 3:
            ^
            break; 
minisketch/src/minisketch.cpp:110:13: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
            case 4:
            ^
minisketch/src/minisketch.cpp:110:13: note: insert '[[fallthrough]];' to silence this warning
            case 4:
            ^
            [[fallthrough]]; 
minisketch/src/minisketch.cpp:110:13: note: insert 'break;' to avoid fall-through
            case 4:
            ^
            break; 
minisketch/src/minisketch.cpp:113:13: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
            case 5:
            ^
minisketch/src/minisketch.cpp:113:13: note: insert '[[fallthrough]];' to silence this warning
            case 5:
            ^
            [[fallthrough]]; 
minisketch/src/minisketch.cpp:113:13: note: insert 'break;' to avoid fall-through
            case 5:
            ^
            break; 
minisketch/src/minisketch.cpp:116:13: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
            case 6:
            ^
minisketch/src/minisketch.cpp:116:13: note: insert '[[fallthrough]];' to silence this warning
            case 6:
            ^
            [[fallthrough]]; 
minisketch/src/minisketch.cpp:116:13: note: insert 'break;' to avoid fall-through
            case 6:
            ^
            break; 
minisketch/src/minisketch.cpp:119:13: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
            case 7:
            ^
minisketch/src/minisketch.cpp:119:13: note: insert '[[fallthrough]];' to silence this warning
            case 7:
            ^
            [[fallthrough]]; 
minisketch/src/minisketch.cpp:119:13: note: insert 'break;' to avoid fall-through
            case 7:
            ^
            break; 
minisketch/src/minisketch.cpp:122:13: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
            case 8:
            ^
minisketch/src/minisketch.cpp:122:13: note: insert '[[fallthrough]];' to silence this warning
            case 8:
            ^
            [[fallthrough]]; 
minisketch/src/minisketch.cpp:122:13: note: insert 'break;' to avoid fall-through
            case 8:
            ^
            break; 
minisketch/src/minisketch.cpp:125:13: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
            default:
            ^
minisketch/src/minisketch.cpp:125:13: note: insert '[[fallthrough]];' to silence this warning
            default:
            ^
            [[fallthrough]]; 
minisketch/src/minisketch.cpp:125:13: note: insert 'break;' to avoid fall-through
            default:
            ^
            break; 
8 warnings generated.

warning: shift count >= width of type [-Wshift-count-overflow]

Clang spits out lots of these warnings, I assume they all have the same cause.

Looks like an unsigned char is being shifted 8 bits. From a quick glance, it seems to be unreachable.

In file included from src/fields/generic_1byte.cpp:11:
In file included from src/fields/generic_common_impl.h:13:
src/fields/../int_utils.h:69:17: warning: shift count >= width of type [-Wshift-count-overflow]
val >>= 8;
^ ~
src/fields/generic_common_impl.h:78:57: note: in instantiation of function template specialization 'BitWriter::Write<2, unsigned char>' requested here
void Serialize(BitWriter& out) const { out.template Write<BITS, I>(m_val); }
^
src/fields/../sketch_impl.h:378:17: note: in instantiation of member function '(anonymous namespace)::Field<unsigned char, 2, 3, RecLinTrans<unsigned char, 2>, RecLinTrans<unsigned char, 2>, &(anonymous namespace)::SQR_TABLE_2, &(anonymous namespace)::QRT_TABLE_2>::Serialize' requested here
val.Serialize(writer);
^
src/fields/../sketch_impl.h:359:5: note: in instantiation of member function 'SketchImpl<(anonymous namespace)::Field<unsigned char, 2, 3, RecLinTrans<unsigned char, 2>, RecLinTrans<unsigned char, 2>, &(anonymous namespace)::SQR_TABLE_2, &(anonymous namespace)::QRT_TABLE_2> >::Serialize' requested here
SketchImpl(int implementation) : Sketch(implementation, F::BITS) {
^
src/fields/generic_1byte.cpp:73:24: note: in instantiation of member function 'SketchImpl<(anonymous namespace)::Field<unsigned char, 2, 3, RecLinTrans<unsigned char, 2>, RecLinTrans<unsigned char, 2>, &(anonymous namespace)::SQR_TABLE_2, &(anonymous namespace)::QRT_TABLE_2> >::SketchImpl' requested here
case 2: return new SketchImpl(implementation);

Fix or disable conversion warnings?

By default, Xcode adds warnings for implicit conversions that lose precision like:
Implicit conversion loses integer precision: 'size_t' (aka 'unsigned long') to 'uint32_t' (aka 'unsigned int')

Noticed when testing the XCode output of #75. There are 10 instances of this and it's pretty annoying in the build logs.

For example:

size_t ComputeCapacity(uint32_t bits, size_t max_elements, uint32_t fpbits) {
    if (bits == 0) return 0;
    uint64_t base_fpbits = BaseFPBits(bits, max_elements);

Is there any interest in cleaning these up? Or would it be preferred to disable them with -Wconversion ?

Test performance with -mlzcnt

At the moment the __builtin_clz* compile down to bsrq on x86_64. Compiling with -mlzcnt wires up the actual instruction.

CountBits<unsigned long long> without -mlzcnt:

_Z9CountBitsmi:
.LFB189:
    .cfi_startproc
    endbr64
    xorl    %eax, %eax
    testq   %rdi, %rdi
    je  .L1
    bsrq    %rdi, %rdi
    movl    $64, %eax
    xorq    $63, %rdi
    subl    %edi, %eax

CountBits<unsigned long long> with -mlzcnt:

_Z9CountBitsmi:
.LFB189:
    .cfi_startproc
    endbr64
    xorl    %eax, %eax
    testq   %rdi, %rdi
    je  .L1
    movl    $64, %eax
    lzcntq  %rdi, %rdi
    subl    %edi, %eax

I'm unable to test the significance of that because my CPU does not support the instruction. But I assume @sipa would probably know right away whether it's worth bothering.

$ clang-12 --version
Ubuntu clang version 12.0.0-3ubuntu1~20.04.4
Target: armv7l-unknown-linux-gnueabihf
Thread model: posix
InstalledDir: /usr/bin
$ make
...
  CXX      minisketch/src/libminisketch_a-minisketch.o
minisketch/src/minisketch.cpp:66:20: warning: unused function 'EnableClmul' [-Wunused-function]
static inline bool EnableClmul()
                   ^
1 warning generated.
...

minisketch gives wrong results (with overfull sketch)

With the following change to example.c

$ diff -Nur ../doc/example.c example.c 
--- ../doc/example.c    2019-04-25 15:42:26.440839823 +0800
+++ example.c   2019-05-07 13:03:57.792123944 +0800
@@ -12,7 +12,8 @@
 
   minisketch *sketch_a = minisketch_create(12, 0, 4);
 
-  for (int i = 3000; i < 3010; ++i) {
+  for (int i = 1; i < 30; ++i) {
+    printf("Add to set a: %u \n", i);
     minisketch_add_uint64(sketch_a, i);
   }
 
@@ -23,7 +24,8 @@
   minisketch_destroy(sketch_a);
 
   minisketch *sketch_b = minisketch_create(12, 0, 4); // Bob's own sketch
-  for (int i = 3002; i < 3012; ++i) {
+  for (int i = 50; i < 100; ++i) {
+    printf("Add to set b: %u \n", i);
     minisketch_add_uint64(sketch_b, i);
   }

Full example.cc:

  1 /**********************************************************************
  2  * Copyright (c) 2018 Pieter Wuille, Greg Maxwell, Gleb Naumenko      *
  3  * Distributed under the MIT software license, see the accompanying   *
  4  * file LICENSE or http://www.opensource.org/licenses/mit-license.php.*
  5  **********************************************************************/
  6 
  7 #include <stdio.h>
  8 #include <assert.h>
  9 #include "../include/minisketch.h"
 10 
 11 int main(void) {
 12 
 13   minisketch *sketch_a = minisketch_create(12, 0, 4);
 14 
 15   for (int i = 1; i < 30; ++i) {
 16     printf("Add to set a: %u \n", i);
 17     minisketch_add_uint64(sketch_a, i);
 18   }
 19 
 20   size_t sersize = minisketch_serialized_size(sketch_a);
 21   assert(sersize == 12 * 4 / 8); // 4 12-bit values is 6 bytes.
 22   unsigned char *buffer_a = malloc(sersize);
 23   minisketch_serialize(sketch_a, buffer_a);
 24   minisketch_destroy(sketch_a);
 25 
 26   minisketch *sketch_b = minisketch_create(12, 0, 4); // Bob's own sketch
 27   for (int i = 50; i < 100; ++i) {
 28     printf("Add to set b: %u \n", i);
 29     minisketch_add_uint64(sketch_b, i);
 30   }
 31 
 32   sketch_a = minisketch_create(12, 0, 4);     // Alice's sketch
 33   minisketch_deserialize(sketch_a, buffer_a); // Load Alice's sketch
 34   free(buffer_a);
 35 
 36   // Merge the elements from sketch_a into sketch_b. The result is a sketch_b
 37   // which contains all elements that occurred in Alice's or Bob's sets, but not
 38   // in both.
 39   minisketch_merge(sketch_b, sketch_a);
 40 
 41   uint64_t differences[4];
 42   ssize_t num_differences = minisketch_decode(sketch_b, 4, differences);
 43   minisketch_destroy(sketch_a);
 44   minisketch_destroy(sketch_b);
 45   if (num_differences < 0) {
 46     printf("More than 4 differences!\n");
 47   } else {
 48     ssize_t i;
 49     for (i = 0; i < num_differences; ++i) {
 50       printf("%u is in only one of the two sets\n", (unsigned)differences[i]);
 51     }
 52   }
 53 }

It looks like minisketch gives wrong results. What am I missing?

$ ./example
Add to set a: 1
Add to set a: 2
Add to set a: 3
Add to set a: 4
Add to set a: 5
Add to set a: 6
Add to set a: 7
Add to set a: 8
Add to set a: 9
Add to set a: 10
Add to set a: 11
Add to set a: 12
Add to set a: 13
Add to set a: 14
Add to set a: 15
Add to set a: 16
Add to set a: 17
Add to set a: 18
Add to set a: 19
Add to set a: 20
Add to set a: 21
Add to set a: 22
Add to set a: 23
Add to set a: 24
Add to set a: 25
Add to set a: 26
Add to set a: 27
Add to set a: 28
Add to set a: 29
Add to set b: 50
Add to set b: 51
Add to set b: 52
Add to set b: 53
Add to set b: 54
Add to set b: 55
Add to set b: 56
Add to set b: 57
Add to set b: 58
Add to set b: 59
Add to set b: 60
Add to set b: 61
Add to set b: 62
Add to set b: 63
Add to set b: 64
Add to set b: 65
Add to set b: 66
Add to set b: 67
Add to set b: 68
Add to set b: 69
Add to set b: 70
Add to set b: 71
Add to set b: 72
Add to set b: 73
Add to set b: 74
Add to set b: 75
Add to set b: 76
Add to set b: 77
Add to set b: 78
Add to set b: 79
Add to set b: 80
Add to set b: 81
Add to set b: 82
Add to set b: 83
Add to set b: 84
Add to set b: 85
Add to set b: 86
Add to set b: 87
Add to set b: 88
Add to set b: 89
Add to set b: 90
Add to set b: 91
Add to set b: 92
Add to set b: 93
Add to set b: 94
Add to set b: 95
Add to set b: 96
Add to set b: 97
Add to set b: 98
Add to set b: 99
3594 is in only one of the two sets
3595 is in only one of the two sets
3622 is in only one of the two sets
3623 is in only one of the two sets

Build on Fedora 29: gcc version 8.3.1 20190223 (Red Hat 8.3.1-2) (GCC)

Wrong output

With the following input:
a:[62997785125752341,16259198011488262487]
b:[13517358214702166750,62997785125752341]
minisketch_create(64, 0, 1);

Output:
1 difference is found, value is -294048887