zhuangdizhu / fedgen Goto Github PK

View Code? Open in Web Editor NEW

228.0 228.0 65.0 638 KB

Code and data accompanying the FedGen paper

Python 98.65% Shell 1.35%

fedgen's People

Contributors

Stargazers

Watchers

fedgen's Issues

Question about 'serverpFedGen.py'function 'visualize_images'

it's seems that this function 'visualize_image' don't work when use commands

the question about main_plot.py

Hello
sorry,I have a problem about main_plot.pyI

the problem
FileNotFoundError: [Errno 2] No such file or directory: 'figs\Mnist/ratio0.5\Mnist-ratio0.5.png'

I hope to have a look during my busy schedule. I just touched this direction.Thank you!

Unable to perform Mnist experiments

when i'm ready to run "python main.py --dataset Mnist-alpha0.01-ratio0.05 --algorithm FedAvg --batch_size 32 --num_glob_iters 200 --local_epochs 20 --num_users 10 --lamda 1 --learning_rate 0.01 --model cnn --personal_learning_rate 0.01 --times 3"I got the following problem。How can I solve it.

Average Global Accurancy = 0.0950, Loss = 2.31.
Traceback (most recent call last):
File "C:\kust\xuesu\code\FedGen-main\FedGen-main\FLAlgorithms\users\userbase.py", line 163, in get_next_train_batch
(X, y) = next(self.iter_trainloader)
File "C:\Users\Administrator\anaconda3\envs\FedGen\lib\site-packages\torch\utils\data\dataloader.py", line 633, in next
data = self._next_data()
File "C:\Users\Administrator\anaconda3\envs\FedGen\lib\site-packages\torch\utils\data\dataloader.py", line 676, in _next_data
index = self._next_index() # may raise StopIteration
File "C:\Users\Administrator\anaconda3\envs\FedGen\lib\site-packages\torch\utils\data\dataloader.py", line 623, in _next_index
return next(self._sampler_iter) # may raise StopIteration
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\kust\xuesu\code\FedGen-main\FedGen-main\main.py", line 85, in
main(args)
File "C:\kust\xuesu\code\FedGen-main\FedGen-main\main.py", line 42, in main
run_job(args, i)
File "C:\kust\xuesu\code\FedGen-main\FedGen-main\main.py", line 37, in run_job
server.train(args)
File "C:\kust\xuesu\code\FedGen-main\FedGen-main\FLAlgorithms\servers\serveravg.py", line 35, in train
user.train(glob_iter, personalized=self.personalized) #* user.train_samples
File "C:\kust\xuesu\code\FedGen-main\FedGen-main\FLAlgorithms\users\useravg.py", line 23, in train
result =self.get_next_train_batch(count_labels=count_labels)
File "C:\kust\xuesu\code\FedGen-main\FedGen-main\FLAlgorithms\users\userbase.py", line 167, in get_next_train_batch
(X, y) = next(self.iter_trainloader)
File "C:\Users\Administrator\anaconda3\envs\FedGen\lib\site-packages\torch\utils\data\dataloader.py", line 633, in next
data = self._next_data()
File "C:\Users\Administrator\anaconda3\envs\FedGen\lib\site-packages\torch\utils\data\dataloader.py", line 676, in _next_data
index = self._next_index() # may raise StopIteration
File "C:\Users\Administrator\anaconda3\envs\FedGen\lib\site-packages\torch\utils\data\dataloader.py", line 623, in _next_index
return next(self._sampler_iter) # may raise StopIteration
StopIteration

Training with CIFAR-10

Thank you for the great work.

Besides, Does anyone try to train with CIFAR-10. I have followed the setup for Mnist: replace the data loader of Mnist to CIFAR-10, change input dimension from 1 to 3, keep the same models. However, the result is not good (about 31%) on FedAvg.

Is there any special setting when do experiment with a new dataset? Thank you

run the code on cuda device

It seems that the code does not supprt CUDA?

--device "cuda" can be set but it seems that it is always running on cpu

Thanks

Partial Parameter Sharing Not Supported

It seems the code implemented does not conduct partial parameter sharing. As shown in line 103 of serverpFedGen.py, the partial parameter is default set to False, but in the paper, the pseudo-code shows only the classifier layer of the user's model is shared. Is it a bug or there is something I misunderstand in the code
self.aggregate_parameters()

Network configs: [6, 16, 'F']

Hi, I'm unable to run any of the files.
This was what is churned out. What does the Network configs: [6, 16, 'F'] mean?
python main.py --dataset Mnist-alpha0.1-ratio0.5 --algorithm FedDistll-FL --batch_size 32 --num_glob_iters 200 --local_epochs 20 --num_users 10 --lamda 1 --learning_rate 0.01 --model cnn --personal_learning_rate 0.01 --times 3

Summary of training process:
Algorithm: FedDistll-FL
Batch size: 32
Learing rate : 0.01
Ensemble learing rate : 0.0001
Average Moving : 1.0
Subset of users : 10
Number of global rounds : 200
Number of local rounds : 20
Dataset : Mnist-alpha0.1-ratio0.5
Local Model : cnn
Device : cpu

     [ Start training iteration 0 ]

Creating model for mnist
Network configs: [6, 16, 'F']
Algorithm FedDistll-FL has not been implemented.

Noise generation in generator.py (line 63)

It seems that torch.rand generates [0,1) uniformly based on the official documentation instead of standard Gaussian. Is this intended? Thanks

Cannot ultilize GPU for FedGen

I run the example experiment for FedGen on Mnist in README.md with the option "--device cuda" but find out there is no process deployed on GPU. I further explore your code and it seems that you have not handled "args.device" in all scripts. Besides, I add "os.environ["CUDA_VISIBLE_DEVICES"] = '0'" in main.py but the model is still deployed only on CPU. I wonder how I can utilize GPU for FedGen. I really appreciate your help!

Question about the implementation of "FedProx"

Hi.

Does your implementation code of FedProx correspond to the algorithm block 2 in the original paper of FedProx? More specifically, the formula for updating lines 53-54 of code file "fedoptimizer.py" seems a little strange, right? In particular, what does lambda mean in FedProx algorithm?

The update formula I understand should be :
p.data=p.data - group['lr'] * ( p.grad. data + group ['mu'] * (p.data - pstar.data.clone())

Looking forward to your reply.

Not work when only sharing the classifier

fedfn, fedntd folder in the image_classification/gfl.

No issue.

Trainloader is not shuffle

The performance of FedAvg is not as good as FedGen simply because the Trainloader does not have a shuffle. After fixing the bugs Fedgen is not as effective as Fedavg.

Wrong tensor type error

If there are wrong tensor type errors when running experiments with FedGen algorithm, see changes in #3

Celeb dataset generation script

Hi, Zhuang

can you share the script to generate the Celeb data? Thanks

generate_niid_dirichlet pose a error:

user训练时的user_output_logp参数感觉有些奇怪

FedGen/FLAlgorithms/users/userpFedGen.py

Line 58 in 0bfd4e1

 user_latent_loss= generative_beta * self.ensemble_loss(user_output_logp, target_p) 

这个循环里，user_output_logp参数第一次使用时是循环外47行定义的，接下来的循环，这个参数就是64行定义的
前者是本地训练的batch的label，后者是random choice的一个batch的label，这个是不是有点奇怪？

Got the reply

Thanks

Error: RuntimeError: Can't call `numpy()` on Tensor that requires grad.

Full error message: RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.

Added the following as line 227 to serverbase.py to resolve:
test_losses = [t.detach() for t in test_losses]

Python version: 3.8.6

plot problem

I think in the file plot_utils.py, the variable 'all_curves' used in the outside of the loop only saves the last algorithm's results, in this way, when we add several algorithms in the config, the plot figure result will cut the other algorithms' trend by following the last one's scope.

max_acc = np.max([max_acc, np.max(all_curves) ]) + 4e-2

python main_plot.py --dataset EMnist-alpha0.1-ratio0.1 --algorithms FedAvg,FedGen,FedProx,FedDistill --batch_size 32 --local_epochs 20 --num_users 10 --num_glob_iters 200 --plot_legend 1

Question about FedProx

Hi.

The update formula I understand should be :
p.data=p.data - group['lr'] * ( p.grad. data + group ['mu'] * (p.data - pstar.data.clone())

Looking forward to your reply.

Redundant loss

I wonder why the user_latent_loss is not mentioned in your paper.

Reproduce "FedDF" baseline

Thank you for open-sourcing your project. I notice that "FedDF" (Ensemble Distillation for Robust Model Fusion in Federated Learning) is one of your baselines in your paper, however, you provide code for only FedAvg, FedProx, FedDistill, and FedGen. Could you please help me reproduce the results of FedDF? I really appreciate your help.

Can't run EMNIST experiment

When I ran the EMNIST experiment after generation of emnist dataset I got:

(pt) wangshu@ubuntu:~/projects/FedGen$ CUDA_VISIBLE_DEVICES=3 python main.py --dataset EMnist-alpha0.1-ratio0.1 --algorithm FedGen --batch_size 32 --local_epochs 20 --num_users 10 --lamda 1 --model cnn --learning_rate 0.01 --personal_learning_rate 0.01 --num_glob_iters 200 --times 3 
================================================================================
Summary of training process:
Algorithm: FedGen
Batch size: 32
Learing rate       : 0.01
Ensemble learing rate       : 0.0001
Average Moving       : 1.0
Subset of users      : 10
Number of global rounds       : 200
Number of local rounds       : 20
Dataset       : EMnist-alpha0.1-ratio0.1
Local Model       : cnn
Device            : cpu
================================================================================


         [ Start training iteration 0 ]           


Creating model for emnist
Network configs: [6, 16, 'F']
Dataset emnist
/home/wangshu/miniconda3/envs/pt/lib/python3.9/site-packages/torch/nn/_reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
  warnings.warn(warning.format(ret))
Build layer 57 X 256
Build last layer 256 X 32
ensemble_lr: 0.0001
ensemble_batch_size: 128
unique_labels: 25
latent_layer_idx: -1
label embedding 0
ensemeble learning rate: 0.0001
ensemeble alpha = 1, beta = 0, eta = 1
generator alpha = 10, beta = 1
Number of Train/Test samples: 12480 8120
Data from 20 users in total.
Finished creating FedAvg server.


-------------Round number:  0  -------------


Traceback (most recent call last):
  File "/home/wangshu/projects/FedGen/main.py", line 85, in <module>
    main(args)
  File "/home/wangshu/projects/FedGen/main.py", line 42, in main
    run_job(args, i)
  File "/home/wangshu/projects/FedGen/main.py", line 37, in run_job
    server.train(args)
  File "/home/wangshu/projects/FedGen/FLAlgorithms/servers/serverpFedGen.py", line 78, in train
    self.evaluate()
  File "/home/wangshu/projects/FedGen/FLAlgorithms/servers/serverbase.py", line 226, in evaluate
    test_ids, test_samples, test_accs, test_losses = self.test(selected=selected)
  File "/home/wangshu/projects/FedGen/FLAlgorithms/servers/serverbase.py", line 165, in test
    ct, c_loss, ns = c.test()
  File "/home/wangshu/projects/FedGen/FLAlgorithms/users/userbase.py", line 137, in test
    loss += self.loss(output, y)
  File "/home/wangshu/miniconda3/envs/pt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wangshu/miniconda3/envs/pt/lib/python3.9/site-packages/torch/nn/modules/loss.py", line 216, in forward
    return F.nll_loss(input, target, weight=self.weight, ignore_index=self.ignore_index, reduction=self.reduction)
  File "/home/wangshu/miniconda3/envs/pt/lib/python3.9/site-packages/torch/nn/functional.py", line 2388, in nll_loss
    ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
IndexError: Target 25 is out of bounds.
(pt) wangshu@ubuntu:~/projects/FedGen$

Pythorch 1.8.1, python 3.9.4.

zhuangdizhu / fedgen Goto Github PK

fedgen's People

Contributors

Stargazers

Watchers

Forkers

fedgen's Issues

Summary of training process: Algorithm: FedDistll-FL Batch size: 32 Learing rate : 0.01 Ensemble learing rate : 0.0001 Average Moving : 1.0 Subset of users : 10 Number of global rounds : 200 Number of local rounds : 20 Dataset : Mnist-alpha0.1-ratio0.5 Local Model : cnn Device : cpu

Recommend Projects

Recommend Topics

Recommend Org

Summary of training process:
Algorithm: FedDistll-FL
Batch size: 32
Learing rate : 0.01
Ensemble learing rate : 0.0001
Average Moving : 1.0
Subset of users : 10
Number of global rounds : 200
Number of local rounds : 20
Dataset : Mnist-alpha0.1-ratio0.5
Local Model : cnn
Device : cpu