ceh-2000 / fed_cvae Goto Github PK
View Code? Open in Web Editor NEWCode for novel methods for one-shot Federated Learning under high statistical heterogeneity.
License: MIT License
Code for novel methods for one-shot Federated Learning under high statistical heterogeneity.
License: MIT License
Implement the algorithm described in the FedProx paper. See also these implementations:
Currently, we use Adam as the local optimizer but this is divergent with the standard in the literature. (This is largely because Adam introduces additional hyperparameters, complicating the tuning process.) Try switching the local optimizer to SGD with no momentum and re-tuning learning rates.
Follow the protocol detailed in blue in this doc.
Describe the findings and paste in Tensorboard plots into this document. Consider writing a shell script to automate experimentation.
Research the algorithm for Distributed VAE
.
We should use a more standard classifier architecture, like the following from McMahan et al. (2017): "a CNN with two 5x5 convolution layers (the first with 32 channels, the second with 64, each followed with 2x2 max pooling), a fully connected layer with 512 units and ReLu activation, and a final softmax output layer (1,663,370 total parameters)."
Currently, the classifier is consistently trained over all global epochs. However, our pipeline schematic indicates that it should only be trained after all communication is done. Add the ability to re-initialize the server classifier's weights each round (essentially training from scratch each round) to see if this makes a difference at all.
We want to explain how FedVAE works. Draft it here.
Currently, in server_fed_vae.py
we only sample latent variables from a tight uniform distribution to obtain high-quality samples for classifier training. It may help to either:
z
s from a multivariate normal, orto obtain a wider variety of intra-class variation for classifier training. It's likely that increasing the number of samples used for classifier training will also be necessary.
A similar approach may help for the knowledge-distillation fine-tuning for the server VAE as well.
We want to write the literature review/introduction section of the paper. Draft it here.
Wait until after meeting with Jay on 8/16.
Implement the chest x-ray dataset used in this paper.
After all algorithms are implemented: clean up the hyperparameters that are printed/logged to tensorboard
in main.py
. As an example, for the unachievable ideal (centralized model), alpha and number of local epochs should not be printed/logged.
Train the global classifier with the training data selected according to the sampling ratio.
Data
class to not separate the data according to number of users (just hand back a single dataset of all available training data according to the sampling ratio).tensorboard
as central_model_sampling_ratio=x.x_number_of_epochs=xxx
.In the original FL paper (McMahan et al. (2017)), they average weights proportionally to number of samples each user has in its local dataset. Currently, we do an unweighted average of user weights.
average_weights
function of utils.py
(see Algorithm 1 of McMahan et al. (2017) for details)We need more datasets than just MNIST.
It would be cleaner and read better to have sample_z
in utils.py
so that we stop calling it via the first user model.
Currently, we just average the decoders to produce an aggregated server model. But, in our pipeline we include a more sophisticated knowledge-distillation-based aggregation scheme, which we should implement. To validate, check the samples for each aggregation method in tensorboard
.
For final experiments, we'd like to show the stability of FedVAE
. To do this, we should run the model several times with different weight initializations but with the same dataset split--the random seed shouldn't affect how the dataset is distributed.
Research the security guarantees of VAEs with respect to sampling around their mean. Go on Google Scholar and search for:
Put findings here.
To qualitatively track the aggregation scheme's ability to preserve local learning, we should log samples images from the aggregated decoder to tensorboard
. See our implementation of this in the previous repository--look at method eval_conditional_image_generation
.
Make it possible to easily test local computation amounts without re-starting the run. After each local training, communicate upwards to the server and log test results, then don't communicate downwards, but run another local epoch and repeat.
Currently, FedVAE generates an even number of samples from all users (teacher decoders).
We want learning rate to be passed in with command line arguments.
FedVAE
.Implement the algorithm described in SCAFFOLD: Stochastic Controlled Averaging for Federated Learning. Read the paper closely and decide whether or not this would be worth including in the suite of extended communication algorithms, i.e., is it sufficiently different from FedProx
in formulation/performance?
Standard FL algorithms classically only involve a fraction of users during each communication round.
main.py
and integrate this into server.py
by adding a base method that (uniformly) randomly samples users according to this fraction.server.py
should be able to use this fraction of selected users, although one-shot methods should set this to 1.0 by default since they involve all users.Implement the one-shot FL algorithm described in Guha et al. (2019)--just the ensembling version that doesn't require auxiliary data.
create_users
method in the base class to split data subsets into additional training/validation subsets (if necessary for the sampling scheme).train
method should allow all users to train and then should have selected users upload their models.Hi, I have a small question. As far as I understand about CVAE, the encoder requires labels as inputs. Why does the conditional encoder in the code only take images as input?
We want to show that few-shot federated learning is a better setting for our model than one-shot. Thus, create a new algorithm that implements FedVAE as a one-shot algorithm.
See this paper for reference.
onefedvae
onefedvae
parameters.ServerFedVAE
to ServerOneFedVAE
and overwrite the server classifier training to sample a new dataset from collected decoders (same as ServerFedVAE
for decoder knowledge distillation) and then train the classifier on this dataset.oneshot
algorithm as reference).README.md
with instructions to run.When using python3 main.py --algorithm fedvae --num_users 2 --alpha 0.1 --sample_ratio 0.25 --glob_epochs 2 --local_epochs 3 --should_log 1 --z_dim 50 --beta 1.0 --classifier_num_train_samples 1000 --classifier_epochs 5 --decoder_num_train_samples 1000 --decoder_epochs 5
, the label distributions begin to not sum to one.
We need to ensure that label distributions always sum to one. This may be a python precision issue.
For every experiment, we want the hard-coded converged ideal value to appear as a line across the top of our plot.
main.py
.args.glob_epochs
.Note: This is a purely for easier visualization, not a true experiment.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.