This implementation of FastCDC has changed the Gear rollsum to use a right-shift instead of a left-shift. This is done so that the least-significant bits of the hash include data from all 32 previous bytes in the rolling window, so that "zero-padding" of the hash judgement mask is not required.
However, this is problematic because it means bytes don't fully "expire" from the hash, so it no longer only includes the last 32 bytes, and bytes from the very start of the block can modify the hash. This is because the addition in the Gear rollsum also
propagates some bit changes upwards, so the shift-right might never fully expire a byte out of the hash. In some ways this improves the entropy/distribution of the hash, but it also means a change early in a chunk can change the break point at the end, messing with resynchronization.
One way to address this would be to replace the addition with an xor, so that bit changes don't propagate upwards. This would make it like a simplified buzzhash. However, this would further weaken the hash, and Gear is already very weak.
The number of bits in the Gear rollsum directly limits how many bytes are included in the sliding window. Reducing the number of bits from 32 to 31 directly reduces the sliding window size. The Gear rollsum is already a not particularly strong rollsum, so this could matter. In languages that don't have uint32 types this might be unavoidable, but copying this in languages without that limitation is unnecessarily limiting the rollsum strength.
This implementation has been widely copied by other implementations, so this problem affects at least the following implementations as well;
For the record, FastCDC's zero-padding solution when using a left-shift is also IMHO sub-optimal. We know that the most significant bits include the most information about the past 32 bytes, so the best solution is simply use the N most significant bits for a 2^N expected size. This can be done with a mask to check the upper bits are zero as in the FastCDC paper like this;
MASK = (-1 << (32-N))
if (!(hash & MASK))
<it's a chunk boundary>
But a shift-right is probably just as fast and an even simpler way to do the same thing;
SHIFT = 32 - N
if !(hash >> SHIFT):
<it's a chunk boundary>
However, just as fast but even more flexible is to check if the hash value is smaller than a threshold. This allows arbitrary expected size values, not just powers-of-2. It uses all the bits in the hash and treats the most significant bits as most important. This matches nicely with the characteristics of the Gear rollsum;
THRESHOLD = 2^32 / tgt_size
if (hash < THRESHOLD):
<it's a chunk boundary>
Note the above suggestions assume uint32 support. They will need obvious modifications if using a 31 bit hash.