There are several compression scheme in this code.
There are two types you can use for quantization, fp4 and fp8, which follow the IEEE 754 format.
- fp4: [sign, exponent, mantissa] = [1,2,1]
- fp8: [sign, exponent, mantissa] = [1,5,2]
There is a error correction coefficient in this algorithm, which can control the magnitude of error term.
The Count Sketch code is from CSVec
Reference: FetchSGD: Communication-Efficient Federated Learning with Sketching
Reference: Don’t Waste Your Bits! Squeeze Activations and Gradients for Deep Neural Networks via TinyScript