Giter VIP home page Giter VIP logo

fl-bench's Introduction

Image

GitHub License GitHub closed issues GitHub Repo stars GitHub Repo forks

Evaluating Federated Learning Methods.

Realizing Your Brilliant Ideas.

Having Fun with Federated Learning.

🎉 FL-bench now can perform FL training in parallel (with the help of ray)! 🎉

Methods 🧬

Traditional FL Methods
Personalized FL Methods
FL Domain Generalization Methods

Environment Preparation 🧩

Just select one of them.

PyPI 🐍

pip install -r .environment/requirements.txt

Conda 💻

conda env create -f .environment/environment.yml

Poetry 🎶

# For those China mainland users
cd .environment && poetry install --no-root

# For those oversea users
cd .environment && sed -i "10,14d" pyproject.toml && poetry lock --no-update && poetry install --no-root

Docker 🐳

# For those China mainland users
docker pull registry.cn-hangzhou.aliyuncs.com/karhoutam/fl-bench:master

# For those oversea users
docker pull ghcr.io/karhoutam/fl-bench:master
# or
docker pull docker.io/karhoutam/fl-bench:master

# An example for building container
docker run -it --name fl-bench -v path/to/FL-bench:/root/FL-bench --privileged --gpus all ghcr.io/karhoutam/fl-bench:master

Easy Run 🏃‍♂️

ALL classes of methods are inherited from FedAvgServer and FedAvgClient. If you wanna figure out the entire workflow and detail of variable settings, go check src/server/fedavg.py and src/client/fedavg.py.

Step 1. Generate FL Dataset

# Partition the MNIST according to Dir(0.1) for 100 clients
python generate_data.py -d mnist -a 0.1 -cn 100

About methods of generating federated dastaset, go check data/README.md for full details.

Step 2. Run Experiment

python main.py <method> [your_config_file.yml] [method_args...]

❗ Method name should be identical to the .py file name in src/server.

# Run FedAvg with default settings. 
python main.py fedavg

How To Customize Experiment Arguments 🤖

  • By modifying config file
  • By explicitly setting in CLI, e.g., python main.py fedprox config/my_cfg.yml --mu 0.01.
  • By modifying the default value in src/utils/constants.py/DEFAULT_COMMON_ARGS or src/server/<method>.py/get_<method>_args()

⚠ For the same FL method argument, the priority of argument setting is CLI > Config file > Default value.

For example, the default value of fedprox.mu is 1,

def get_fedprox_args(args_list=None) -> Namespace:
    parser = ArgumentParser()
    parser.add_argument("--mu", type=float, default=1.0)
    return parser.parse_args(args_list)

and you set

# your_config.yml
...
fedprox:
  mu: 0.01

in your config file. If you run

python main.py fedprox                           # fedprox.mu = 1
python main.py fedprox your_config.yml           # fedprox.mu = 0.01
python main.py fedprox your_config.yml --mu 10   # fedprox.mu = 10

Monitor 📈

  1. Run python -m visdom.server on terminal.
  2. Set visible as true.
  3. Go check localhost:8097 on your browser.

Using Ray for Parallel Training

You need to set

# your_config_file.yml
mode: parallel
parallel:
  num_workers: 2 # any positive integer that larger than 1
  ...
...

for parallel training, which will vastly improve your training efficiency.

Creating a Ray Cluster

A Ray cluster would be created implicitly by python main.py <method> .... Or you can manually launch it to avoid creating cluster each time by running experiment.

# your_config_file.yml
mode: parallel
parallel:
  ray_cluster_addr: null
  ...
...
ray start --head [OPTIONS]

Arguments 🔧

All common arguments have their default value. Go check DEFAULT_COMMON_ARGS in src/utils/constants.py for full details of common arguments.

⚠ Common arguments cannot be set via CLI.

You can also write your own .yml config file. I offer you a template in config and recommend you to save your config files there also.

One example: python main.py fedavg config/template.yaml [cli_method_args...]

About the default values of specific FL method arguments, go check corresponding FL-bench/src/server/<method>.py for the full details.

Arguments Type Description
dataset str The name of dataset that experiment run on.
model str The model backbone experiment used.
seed int Random seed for running experiment.
join_ratio float Ratio for (client each round) / (client num in total).
global_epoch int Global epoch, also called communication round.
local_epoch int Local epoch for client local training.
finetune_epoch int Epoch for clients fine-tunning their models before test.
test_interval int Interval round of performing test on clients.
eval_test bool Non-zero value for performing evaluation on joined clients' testset before and after local training.
eval_val bool Non-zero value for performing evaluation on joined clients' valset before and after local training.
eval_train bool Non-zero value for performing evaluation on joined clients' trainset before and after local training.
optimizer dict Client-side optimizer. Argument request is the same as Optimizers in torch.optim.
lr_scheduler dict Client-side learning rate scheduler. Argument request is the same as schedulers in torch.optim.lr_scheduler.
verbose_gap int Interval round of displaying clients training performance on terminal.
batch_size int Data batch size for client local training.
use_cuda bool Non-zero value indicates that tensors are in gpu.
visible bool Non-zero value for using Visdom to monitor algorithm performance on localhost:8097.
straggler_ratio float The ratio of stragglers (set in [0, 1]). Stragglers would not perform full-epoch local training as normal clients. Their local epoch would be randomly selected from range [straggler_min_local_epoch, local_epoch).
straggler_min_local_epoch int The minimum value of local epoch for stragglers.
external_model_params_file str The relative file path of external model parameters. Please confirm whether the shape of parameters compatible with the model by yourself. ⚠ This feature is enabled only when unique_model=False, which is pre-defined by each FL method.
save_log bool Non-zero value for saving algorithm running log in out/<method>/<start_time>.
save_model bool Non-zero value for saving output model(s) parameters in out/<method>/<start_time>.pt`.
save_fig bool Non-zero value for saving the accuracy curves showed on Visdom into a .pdf file at out/<method>/<start_time>.
save_metrics bool Non-zero value for saving metrics stats into a .csv file at out/<method>/<start_time>.
viz_win_name str Custom visdom window name (active when setting visible as a non-zero value).
check_convergence bool Non-zero value for checking convergence after training.

Arguments of Parallel Training 👯‍♂️

Arguments Type Description
num_workers int The number of parallel workers. Need to be set as an integer that larger than 1.
ray_cluster_addr str The IP address of the selected ray cluster. Default as null, which means ray will build a new cluster everytime you running an experiment and destroy it at the end. More details can be found in the official docs.
num_cpus and num_gpus int The amount of computational resources you allocate. Default as null, which means all.

Supported Models 🚀

This benchmark supports bunch of models that common and integrated in Torchvision:

  • ResNet family
  • EfficientNet family
  • DenseNet family
  • MobileNet family
  • LeNet5 ...

🤗 You can define your own custom model by filling the CustomModel class in src/utils/models.py and use it by setting model to custom when running.

Supported Datasets 🎨

Regular Image Datasets

  • MNIST (1 x 28 x 28, 10 classes)

  • CIFAR-10/100 (3 x 32 x 32, 10/100 classes)

  • EMNIST (1 x 28 x 28, 62 classes)

  • FashionMNIST (1 x 28 x 28, 10 classes)

  • Syhthetic Dataset

  • FEMNIST (1 x 28 x 28, 62 classes)

  • CelebA (3 x 218 x 178, 2 classes)

  • SVHN (3 x 32 x 32, 10 classes)

  • USPS (1 x 16 x 16, 10 classes)

  • Tiny-ImageNet-200 (3 x 64 x 64, 200 classes)

  • CINIC-10 (3 x 32 x 32, 10 classes)

Domain Generalization Image Datasets

Medical Image Datasets

Customization Tips 💡

Implementing FL Method

The package() at server-side class is used for assembling all parameters server need to send to clients. Similarly, package() at client-side class is for parameters clients need to send back to server. You should always has super().package() in your override implementation.

  • Consider to inherit your method classes from FedAvgServer and FedAvgClient for maximum utilizing FL-bench's workflow.

  • For customizing your server-side process, consider to override the package() and aggregate().

  • For customizing your client-side training, consider to override the fit() or package().

You can find all details in FedAvgClient and FedAvgServer, which are the bases of all implementations in FL-bench.

Integrating Dataset

  • Inherit your own dataset class from BaseDataset in data/utils/datasets.py and add your class in dict DATASETS.

Customizing Model

  • I offer the CustomModel class in src/utils/models.py and you just need to define your model arch.
  • If you want to use your customized model within FL-bench's workflow, the base and classifier must be defined. (Tips: You can define one of them as torch.nn.Identity() for bypassing it.)

Citation 🧐

@software{Tan_FL-bench,
  author = {Tan, Jiahao and Wang, Xinpeng},
  license = {GPL-2.0},
  title = {{FL-bench: A federated learning benchmark for solving image classification tasks}},
  url = {https://github.com/KarhouTam/FL-bench}
}

@misc{tan2023pfedsim,
  title={pFedSim: Similarity-Aware Model Aggregation Towards Personalized Federated Learning}, 
  author={Jiahao Tan and Yipeng Zhou and Gang Liu and Jessie Hui Wang and Shui Yu},
  year={2023},
  eprint={2305.15706},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}

fl-bench's People

Contributors

karhoutam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

fl-bench's Issues

are running_var, running_mean, num_batches_tracked keys trainble??

Hi, i recently found your wonderful FL-bench repository.
i have question about your code structure.

i'm using python 3.10 and torch 1.13.1 version.

in FL-bench/src/client/fedavg.py code,

line 73 if not param.requires_grad

generates running_mean, running_var, num_batches_tracked keys.

for what i know, fedavg only updates weight, bias. and keep the batchnorm var(running_mean, running_var, num_batches_tracked keys) same.

so, i'm guessing your machine does not generate running_mean, running_var, num_batches_tracked keys.

is this because of my library version mismatch of your version?

problem run pre-treatment

when I run

sed -i "10,14d" pyproject.toml && poetry lock --no-update && poetry install

that is stoped at the 13/15 step with typeError

also

docker build \ -t fl-bench \ --build-arg IMAGE_SOURCE=karhou/ubuntu:basic \ --build-arg CHINA_MAINLAND=false .
Hash for nvidia-cublas-cu12 (12.1.3.1) from archive nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl not found in known hashes (was: sha256:98d15fd621af39d255ca783f5e7b7f17d3f25a3e639a307944576aa17b30cc51)

at /usr/local/lib/python3.10/dist-packages/poetry/installation/executor.py:799 in _validate_archive_hash
795│ archive_hash: str = "sha256:" + get_file_hash(archive)
796│ known_hashes = {f["hash"] for f in package.files if f["file"] == archive.name}
797│
798│ if archive_hash not in known_hashes:
→ 799│ raise RuntimeError(
800│ f"Hash for {package} from archive {archive.name} not found in"
801│ f" known hashes (was: {archive_hash})"
802│ )
803│

Cannot install nvidia-cublas-cu12.

• Installing nvidia-cusparse-cu12 (12.1.0.106)
The command '/bin/sh -c poetry install' returned a non-zero code: 1

at the same step 13/15

I don't know it is possible to run the main code with those problem ?

Loss function in FedLC

In FedLC, the logits of the right class doesn't exist in the denominator of modified CE loss function, I think it doesn't converge. Do you have the same feeling? In your implements, that item is in the denominator.

What'more, FedLC has the same idea as DMFL, IJCNN 2021.

运行结果问题

          > 首先感谢支持。❤

关于 perfedavg,perfedavg 将联邦学习与元学习结合。元学习中经常遭人诟病的就是其难以训练的特点,加上原 paper 中是使用 MLP 来作为模型骨架在 mnist 上训练的。所以言下之意是 perfedavg 在非凸模型(如 CNN)的训练状况其实是无法保障的。

所以关于您的问题,不好意思我也无法解释清楚为什么 perfedavg 的训练曲线并不平稳,我只能保证我的代码是能正确反应 perfedavg 的运行过程。

您好,大佬,很不好意思再次来叨扰您,我在运行的时候出现,loss是nan,我想问一下这是正常现象吗?
image

Originally posted by @TigerAB1 in #30 (comment)

[Question] 使用perfedavg的结果问题

您好,感谢您的开源精神,我认为这个项目非常有帮助,所以我也通过微信赞助了您,以表心意。同时我也遇到了一些疑惑,我在运行的是perfedavg的时候,发现图像并不是慢慢增长的。
mnist  3

image

For pFedMe

感谢您的开源代码,麻烦问一下,您在pFedMe代码中是先采样客户端进行训练的,而pFedMe源码中是先训练所有客户端然后采样进行聚合。不理解为什么源码和论文是这样设计的,这样设计会多训练很多没有用的客户,训练时间会增加。而FedAvg是先采样后训练的初衷就是增加训练速度。不知道我的理解对不对。
还请问,先采样后训练这种方式是不是不影响pFedAvg的性能。
image

Pretrained model evaluation

First of all, thank you very much for the project! I think a lot of researchers/practitioners can benefit from it! It is a huge contribution to the area!

I have a question related to the evaluation of the pretrained model.
I have trained an algorithm (e.g. FedRep) with standard hyperparams on cifar10 and saved the resulting model.

Next, following the instruction, I provided a path to the --external_model_params_file parameter (just hardcoded appropriate value to default path) and everything successfully loaded.

But for me, it is not clear how to evaluate this model.

I did the following:

from src.server.fedrep import FedRepServer

model: FedRepServer = FedRepServer() # path to the weights hardcoded

model.test()
out = model.test_results

print(out)

, and the results I have are way worse compared to the ones I have in cifar10_log.html file:

When training ended:
{100: {'loss': '0.6021 -> 0.0000', 'accuracy': '74.69% -> 0.00%'}}

When I loaded:
{1: {'loss': '98.6518 -> 0.0000', 'accuracy': '18.83% -> 0.00%'}}

I am pretty sure I can miss something. Can you elaborate on this, please?

And also a question more specific to FedRep. Shouldn't we have here unique_model=True, as each of the clients keeps the version of the head from the previous round?

Thank you!

cluster aggregation issue

def aggregate_clusterwise(self):

in this function, all client's deltas in self.delta_list are used in cluster aggregation, including the clients that are not participanting the current round.

I don't think it matches the meaning of the original paper. Please correct me if my understanding is wrong

有关FedAP算法的问题

大佬,我想请问一下您,您复现的算法FedAP中的f-FedAP和d-FedAP分别指得是啥呢,十分感谢!希望能的得到您的回复

Comparison of the results of FedAvg on Cifar with the original paper

I used the FedAvg CNN code defined in your warehouse to train the Cifar10 data set. The final result can only converge to about 0.8, which is quite different from what was published in the author's paper. Have you tried this experiment?

The following is the result in the paper. It takes more than 500 rounds for me to converge to 0.8, and I will not be able to improve it later.
未命名

is the trainset and testset same in the same client?

hello, thanks for your code.
i have a question for my study, is that is the trainset and testset same in the same client?
because when my dirichlet parameter alpha is 2, i found that, the accuracy is about 50% in 100 round, and i saw the label y is strange like tensor([7, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 6, 6, 7, 7, 7, 7]), do you think it is two many 7?
please let me know your idea soon, it is very important to me. thank you very much.

hello, when i change 'split' to 'user', it can't run anymore

thanks for your code.
i have a question, it that when i change 'split' from 'sample' to 'user', it can't run.
i think may be because there is no test data in client when 'split' in 'user' mode, which is caused by a bug below.

File "/sftpFile/src/client/fedavg.py", line 112, in train_and_log
    loss_before / num_samples,
ZeroDivisionError: division by zero

so my first question is , how to run in the 'user' mode?

and i have another question,
i have notice that, in the 'sample' split mode, the code use 'test data' and 'train data' in the same client, does it work? because i test the acc is 8-90%, higher than the center training, if it is overfitting? and i want to know how to evaluate the acc, from the chart, or the test result?

Thank you very much!

  • the chart

image

- the test result

image

关于数据集分类

感谢您的分享!
我想问一些关于数据集分类的问题:
当我运行 generate_data.py -d cifar10 -c 4 -cn 10 时,是否为每个client分配4个类呢?每个类的数量是否相同?

期待您的答复

Data split of femnist dataset

hello, thanks for your code.
I have a question about the data split of femnist dataset.
I ran the following command and expect the dataset to be splitted into 10 parts.

./preprocess.sh -s niid --sf 1.0 -k 0 -t sample --iu 10

However, the args.json still indicates the the client number is 36 like this.

{"dataset": "femnist", "client_num": 36, "fraction": 0.5, "seed": 42, "split": "sample", "iid": true}

I don't know what did I miss and could you kindly explain? Thx!!

[Implementation Error] algorithm "ccvr" code lost a "()"

p1

https://github.com/KarhouTam/FL-bench/blob/54d65a7d91bcf255a16381e103102142d34a72d8/src/server/ccvr.py#L71C1-L73C14
According to the original formula:

image

The last term 'labels_count[c] - 1' should be wrapped by a '()'

          classes_cov[c] -= labels_count[c] / (labels_count[c] - 1) * (
              classes_mean[c].unsqueeze(1) @ classes_mean[c].unsqueeze(0)
          )

Plz correct this.

p2

Second, in the same file, to fix the problem caused by the lack of specific target data.

    def generate_virtual_representation(
        self, classes_mean: List[torch.Tensor], classes_cov: List[torch.Tensor]
    ):
        data, targets = [], []
        for c, (mean, cov) in enumerate(zip(classes_mean, classes_cov)):

I recommend you add the following code to avoid collapse.

    def generate_virtual_representation(
        self, classes_mean: List[torch.Tensor], classes_cov: List[torch.Tensor]
    ):
        data, targets = [], []
        for c, (mean, cov) in enumerate(zip(classes_mean, classes_cov)):
            if torch.all(torch.isnan(mean)) or torch.all(torch.isnan(cov)):
                continue

TypeError: 'type' object is not subscriptable

C:\Users\20834\anaconda3\envs\ChatGLM-6B-new\python.exe F:\FL-bench\data\generate_data.py -d cifar10 -a 0.1 -cn 100
Traceback (most recent call last):
File "F:\FL-bench\data\generate_data.py", line 16, in
from utils.schemes import (
File "F:\FL-bench\data\utils\schemes_init_.py", line 5, in
from .semantic import semantic_partition
File "F:\FL-bench\data\utils\schemes\semantic.py", line 22, in
from src.config.utils import get_best_device
File "F:\FL-bench\src\config\utils.py", line 69, in
src: Union[OrderedDict[str, torch.Tensor], torch.nn.Module],
TypeError: 'type' object is not subscriptable

\FL-bench\src\config\utils.py

def trainable_params(
src: Union[OrderedDict[str, torch.Tensor], torch.nn.Module],
detach=False,
requires_name=False,
) -> Union[List[torch.Tensor], Tuple[List[torch.Tensor], List[str]]]:

It seems like you're encountering a TypeError, which says 'type' object is not subscriptable. This error is most often raised when you are trying to treat a class or a type like a list or dictionary.

Your error seems to be related to this line:

src: Union[OrderedDict[str, torch.Tensor], torch.nn.Module],

The problem here is that Python cannot interpret the code as it's supposed to. The syntax you've used here is typical of Python 3.8+ (PEP 585), where you can directly use built-in types like list, tuple, dict for type hinting.

However, if you're using a version of Python prior to 3.8, you'll need to import these types from the typing module. For Python 3.7 or below, your line of code should look something like:

from typing import Union, List, Tuple, OrderedDict
from torch import Tensor

src: Union[OrderedDict[str, Tensor], torch.nn.Module],

Please verify the Python version you're using. If you're using Python 3.7 or below, consider upgrading your Python to 3.8+ to take advantage of the PEP 585 features. Otherwise, you'll need to import OrderedDict from typing.

fix bug TypeError: 'type' object is not subscriptable add the following source to file \FL-bench\src\config\utils.py

from typing import Union, List, Tuple, OrderedDict

数据生成问题

您好,我想问一下为什么我设置cn为1000的时候,为什么不行?

implementation to segmentation task

Hi KarhouTam.

I'd like to implement your fed benchmark setting to my segmentation task.

but i think the structuring segmentation model properly to your setup is little bit tricky.

what i understand for properly setup to your work, it requires base layer and classifier layer and etc. ....

so is there any advice for setting up segmentation model?

Evaluation in test phase

In the test phase, it seems that there is only average of local test on clients, but no global test on the server?

python generate_data.py -d medmnistC -a 0.1 -cn 100

When I use the command python generate_data.py -d medmnistC -a 0.1 -cn 100, it takes a long time to execute and seems to fail because I previously used python generate_data.py -d medmnistC -a 0.5 -cn 100. Do you know how to resolve this issue?

Dataset problem

Hello, if I want to run this framework on my own dataset(image), what should I do?

Pretrained model loading

Many thanks for the contribution to FL community. Really benefit a lot.

When I wanted to load the pretrained model, I didn't find a universal/easy way to do it, (i.e., in args).

After checking the code, I believe I should change the first parameter in trainable_params change to my desired checkpoints.

        self.model = use_model.to(self.device)
        
        # FIXME: using pre-trained models
        init_trainable_params, self.trainable_params_name = trainable_params(
            self.model, detach=True, requires_name=True
        )

Am I corrected? Thank you very much for your reply!

关于finetune的问题

您好,首先非常感谢您的开源精神和贡献,我在学习您的代码的时候我发现如果在生成数据的时候,如果我用--split user,那么在
` def finetune(self):
"""
The fine-tune function. If your method has different fine-tuning opeation, consider to override this.
This function will only be activated while in FL test round.
"""
self.model.train()
for _ in range(self.args.finetune_epoch):
for x, y in self.trainloader:
if len(x) <= 1:
continue

            x, y = x.to(self.device), y.to(self.device)
            logit = self.model(x)
            loss = self.criterion(logit, y)
            self.optimizer.zero_grad()
            loss.backward()
            self.optimizer.step()`时候,由于在test的时候for client_id in self.test_clients: ,那么self.trainloader且不是空的?不知道是我理解错误还是什么,如果您百忙之中能回答我的问题,我将不胜感激。

FedPer - Set split point of neural network at different locations

Hi there!

How can I set the split point at different locations in a neural network e.g. ResNet18? For example, in their paper (see Figure 4a&b), they use different amount of layers in the classifier, thus different amount of layers in the base.

Is this already implemented?

Best regards,
W

Confusion about test_before and test_after

Hi karhouTam,

I am just confused about understanding the "after" and "before" of test_acc(before) and test_acc(after) in your code. Could you explain them a little bit? Thank you.

Opinion about dataset split

Hi, KarhouTam.
I recently talked about dataset split for Traditional FL(fedavg, fedprox, feddyn, etc. ...) with my colleague.
my colleague insisted that in order to evaluate the FL algorithm, i should evaluate the model on isolated dataset
(i.e. when MNIST, first split 6000~ imgs to test dataset for global model evaluation and then, assign the rest of imgs to each client for test and eval).
as far as i know, global server can't see the entire dataset for privacy issue. right?
i think thats why your dataset creation setting also don't assign a test dataset for global model evaluation.

what is your opinion about this?

About the convergence of FedLC

FedLC's training on CIFAR10/100 tends to breakdown after a random epoch. And in different random seeds, the breakdown epoch seems to be different. The problem doesn't exist in TinyImageNet training. Ask for your help.

数据集生成和示例demo运行的问题

非常感谢这个联邦学习算法仓库,然而在运行时却出现了一个问题,求教。

python generate_data.py -d cifar10 -a 0.1 -cn 100

Traceback (most recent call last):
File "generate_data.py", line 16, in
from utils.schemes import (
File "/root/FL-bench-master/data/utils/schemes/init.py", line 5, in
from .semantic import semantic_partition
File "/root/FL-bench-master/data/utils/schemes/semantic.py", line 22, in
from src.config.utils import get_best_device
File "/root/FL-bench-master/src/config/utils.py", line 69, in
src: Union[OrderedDict[str, torch.Tensor], torch.nn.Module],
TypeError: 'type' object is not subscriptable

生成数据集时出现了这个报错,还没有找到解决办法。

Bug report of FedMD code

the code of FedMD algorithmn not work.
When I run the command:

cd src/server 
python fedmd.py

An error has occurred with the following error message:

Traceback (most recent call last):
File "/home/cjj/gitproject/FL-bench/src/server/fedmd.py", line 78, in
server = FedMDServer()
File "/home/cjj/gitproject/FL-bench/src/server/fedmd.py", line 39, in init
self.trainer = FedMDClient(
File "/home/cjj/gitproject/FL-bench/src/client/fedmd.py", line 24, in init
self.public_dataset = DATASETS[self.args.public_dataset](
TypeError: MNIST.init() got an unexpected keyword argument 'transform'

I think the bug maybe in the file src/client/fedmd.py of function FedMDClient__.init__()

My environment

Python 3.10.10

Experiment Arguments:

{
'model': 'lenet5',
'dataset': 'cifar10',
'seed': 42,
'join_ratio': 0.1,
'global_epoch': 100,
'local_epoch': 5,
'finetune_epoch': 0,
'test_gap': 100,
'eval_test': 1,
'eval_train': 0,
'local_lr': 0.01,
'momentum': 0.0,
'weight_decay': 0.0,
'verbose_gap': 100000,
'batch_size': 32,
'visible': 0,
'global_testset': 0,
'straggler_ratio': 0,
'straggler_min_local_epoch': 1,
'use_cuda': 1,
'save_log': 1,
'save_model': 0,
'save_fig': 1,
'save_metrics': 1,
'digest_epoch': 1,
'public_dataset': 'mnist',
'public_batch_size': 32,
'public_batch_num': 5,
'dataset_args': {'dataset': 'cifar10', 'client_num': 100, 'fraction': 0.5, 'seed': 42, 'split': 'sample', 'alpha': 0.1, 'least_samples': 40}
}

自定义数据集划分

您好,
我想按客户机号自定义划分数据类别。比如客户机1、2只有数据类A,客户机3、4只有数据类B......请问您针对我的需求有没有合适的建议?(从哪部分代码开始修改?)

runtime erro

"I want to specify the model as MobileNetV2 using your code, but I keep getting an error, 'RuntimeError: size mismatch (got input: [10], target: [32]).' Do you know what's happening?"

please can somebody helps me to solve this problem

(base) alami@alami-Latitude-7390:~/Téléchargements/FL-bench-master$ sed -i "26,30d" pyproject.toml && poetry lock --no-update && poetry install

RuntimeError

The Poetry configuration is invalid:
- [readme] ['README.md', 'data/README.md'] is not of type 'string'

at /usr/lib/python3/dist-packages/poetry/core/factory.py:43 in create_poetry
39│ message = ""
40│ for error in check_result["errors"]:
41│ message += " - {}\n".format(error)
42│
→ 43│ raise RuntimeError("The Poetry configuration is invalid:\n" + message)
44│
45│ # Load package
46│ name = local_config["name"]
47│ version = local_config["version"]

question

"How does your codebase implement testing for each category of domain? When I run it, I only get one result for the domain."

std

Could you please explain how the standard deviation is calculated in your pfedsim paper? The standard deviation I obtained from running your codebase is quite different from yours

About SCAFFOLD

Thanks for your code! I want to know why the curve of SCAFFOLD on emnist behave like this?
794e9b3a-2e9b-462d-a6a7-20771c0323dd

Question of the output and the test accuracy

你好,非常感谢你的这个非常完整的FL的框架,非常的方便,但我有点小问题关于输出,假如你有时间希望可以帮我看看。非常感谢你的帮助

我在tiny-imagenet上训练resnet-18, 使用如下的运行命令:
python src/server/fedavg.py -m res18 -d tiny_imagenet -jr 1.0 -ge 20 -le 10 -bs 64 -lr 0.01 -mom 0.9 -wd 0.00001 -v 1 -vg 10

得到了如下的结果:
Screenshot from 2023-08-24 10-40-30

(1) 麻烦请问一下最上面的那些结果是他使用没有聚合过(本地训练后)的模型得到的准确率吗?
(2) 最下面的convergence那里的数字是在global 模型上得到的准确率吗?
(3) 具体而言,我应该如何使用您的代码,让他在每轮训练后检测全局模型的准确率,我是应该将--test_gap 设置成1吗?

非常感谢你的代码还有回答,代码写的很好,但我能力有限,看的有点晕,以上我还是不太清楚。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.