Feature request From the <a href="https://docs.zama.ai/concrete-ml

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thanks so much for such a detailed and clear explanation, <a class="user-mention notra

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Adding encrypted training for other ML models and DL models about concrete-ml HOT 5 OPEN

riemanli commented on July 29, 2024

Adding encrypted training for other ML models and DL models

from concrete-ml.

Comments (5)

jfrery commented on July 29, 2024 1

Hi @riemanli,

We would be very happy for you to contribute! Let me try to answer all you question the best I can.

Before I start let me just explain how you can train models using torch and then our compile_torch_model from concrete-ml. This compile function takes everything within the forward of the torch model and convert it to the FHE equivalent. So, what we did is, instead of writing only the inference part, we also added the backward part. The model, instead of returning prediction, returns it's updated parameters.

Here is the code in question -> https://github.com/zama-ai/concrete-ml/blob/main/src/concrete/ml/sklearn/_fhe_training_utils.py#L23-L97.

Then we build around this.

What are the reasons that there is no encrypted training for other ML/DL models? Is it because there is some limitation in either concrete or concrete-ml that blocks this development? If so, what are those limitations?

Internally we did train few layers MLP following the method I described. It works to some extent. The main limitation is probably the limited precision for gradients / weights. We have 6 bits of precision for the gradients currently, this is not ideal considering we use uniform quantization. As the model trains and converges, the gradients tend to be orders of magnitude smaller.

That being said there are different ways to implement FHE training. We currently return the parameter of the model after each iteration. But we could certainly imagine returning the gradients and accumulate them either in FHE or on the client's machine which would allow use to have much more precision on the gradients.

Some potential constraints I observed from the code that does encrypted learning on SGDClassifier:

(1) Parameter range has to be preset

Is this inevitable due to overflowing during FHE computation?

Yes so this was a very counterintuitive parameter to have but we did not find a better way for now. We have a very limited precision so we need to be as tight as possible to the values that parameters can take.

(2) Floating point distribution of input has to be similar

Could you elaborate more on this?

I suppose you are referring to the fine-tuning part where we reuse the same FHE training circuit to learn a different dataset.

Here we just meant that the min max values of inputs should be the same basically. If they are not there will be a loss of information (because we clip on the previous min max) or precision (because we would not use effectively the available quantizer precision).

(3) Learning rate == 1

Does it mean we cannot have arbitrary learning rates?

The issue with learning rates is related to how numbers are represented and manipulated. Let me try to explain this clearly.

First, you have to know that when we do an addition or subtraction, both numbers must use the same level of precision.
If they don't match, the less precise number is "upscaled" to match the more precise one. The learning rate is applied to the gradient (multiplied together). At this point everything is fine.

The issue occurs when updating the weights by subtracting the gradient. If the learning rate is small, the gradients become much more precise than the weights. This forces the weights to be upscaled to match the gradients' precision. Since weight are 6 bits and their range are predefined, with a small learning rate, the change to the weights becomes so small that it's smaller than one "step" in the quantized representation (input of the circuit). The parameter quantizer essentially rounds this tiny change to zero resulting in no actual update to the weights.

So this means, the way we do this cannot support small learning rate. But as I said above, the method can be updated e.g. accumulate gradients to mitigate this limitation.

from concrete-ml.

riemanli commented on July 29, 2024 1

Thanks so much for such a detailed and clear explanation, @jfrery! The information is great and sufficient for me to start something. I will reach out again when I encounter issues or questions during development.

from concrete-ml.

riemanli commented on July 29, 2024

Hi @jfrery, what's the best way to add a backward process for NN? I don't want to reinvent the wheel for backward, so I tried to embed the Torch optimizer into forward, but it seems Concrete compiler doesn't support that.

My experiment code

class TinyCNNTrainer(nn.Module):

    def __init__(self, n_classes) -> None:
        super().__init__()
        self.trainee = TinyCNN(n_classes)
        self.loss = nn.CrossEntropyLoss()
        self.optimizer = torch.optim.Adam(self.trainee.parameters())

    def forward(self, x, y):
        self.trainee.train()

        output = self.trainee(x)

        loss = self.loss(output, y.reshape(-1).long())
        loss.backward()
        self.optimizer.step()
        self.optimizer.zero_grad()

        return output

from concrete-ml.

jfrery commented on July 29, 2024

Hi @riemanli,

You've got the right idea on how to do this but there are a few problems:

the trainer you built is converted to a fhe circuit that takes x and y and returns self.trainee(x). Basically the gradient step you apply has no effect on the output and so it won't be part of the circuit.

When you use compile_torch_model, what happens is that we trace the input until the output of the circuit and create a FHE circuit that reproduces the output. Anything that does not impact the input is just discarded.

If you want this to work, you will need to provide the parameters of the model as an input of the forward and output either the gradients or the updated parameters.

This is what our logistic regression training forward looks like:

    def forward(
        self,
        features: torch.Tensor,
        targets: torch.Tensor,
        weights: torch.Tensor,
        bias: torch.Tensor,
    ) -> Tuple[torch.Tensor, torch.Tensor]:

In theory, your approach of using the optimizer directly in the forward should work and that's something we also tried on our side. But in practice, this auto grad creates lots of nodes in the onnx where some of them might not be supported in concrete-ml. In the end we rewrote the backward pass to ease our life but eventually your approach should be preferred as it much more scalable to any kind of model.

I am curious to see what problem you encounter.

from concrete-ml.

Adding encrypted training for other ML models and DL models about concrete-ml HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent