NOTE: FedJAX is not an officially supported Google product. FedJAX is still in the early stages and the API will likely continue to change.
FedJAX is a JAX-based open source library for Federated Learning simulations that emphasizes ease-of-use in research. With its simple primitives for implementing federated learning algorithms, prepackaged datasets, models and algorithms, and fast simulation speed, FedJAX aims to make developing and evaluating federated algorithms faster and easier for researchers. FedJAX works on accelerators (GPU and TPU) without much additional effort. Additional details and benchmarks can be found in our paper.
You will need a moderately recent version of Python. Please check the PyPI page for the up to date version requirement.
First, install JAX. For a CPU-only version:
pip install --upgrade pip
pip install --upgrade jax jaxlib # CPU-only version
For other devices (e.g. GPU), follow these instructions.
Then, install FedJAX from PyPI:
pip install fedjax
Or, to upgrade to the latest version of FedJAX:
pip install --upgrade git+https://github.com/google/fedjax.git
Below is a simple example to verify FedJAX is installed correctly.
import fedjax
import jax
import jax.numpy as jnp
import numpy as np
# {'client_id': client_dataset}.
fd = fedjax.InMemoryFederatedData({
'a': {
'x': np.array([1.0, 2.0, 3.0]),
'y': np.array([2.0, 4.0, 6.0]),
},
'b': {
'x': np.array([4.0]),
'y': np.array([12.0])
}
})
# Initial model parameters.
params = jnp.array(0.5)
# Mean squared error.
mse_loss = lambda params, batch: jnp.mean(
(jnp.dot(batch['x'], params) - batch['y'])**2)
# Loss for clients 'a' and 'b'.
print(f"client a loss = {mse_loss(params, fd.get_client('a').all_examples())}")
print(f"client b loss = {mse_loss(params, fd.get_client('b').all_examples())}")
The following tutorial notebooks provide an introduction to FedJAX:
You can also take a look at some of our working examples:
To cite this repository:
@article{fedjax2021,
title={{F}ed{JAX}: Federated learning simulation with {JAX}},
author={Jae Hun Ro and Ananda Theertha Suresh and Ke Wu},
journal={arXiv preprint arXiv:2108.02117},
year={2021}
}
fedjax's People
Forkers
alshedivat jaehunro iamthatiam777 alabid akaanirban kho methimpact dedsec-9 tangx-yy isabella232 stheertha saipraneet anukaal degregat modestgoblin stjordanis shubhammittal98 ai-hub-deep-learning-fundamental omeremhan yang-zheming metavai nerdai ichiruchan aouedions11 yushuaiji marcociccone mahi97 python-repository-hub pp-qq cmoyacal amitport mbrukman siabdullah4 jungsungjae lando-l ethicalsecurity-agency ghas-results mvandermeulen rka97fedjax's Issues
Implement standard CIFAR-100 model in fedjax.models.cifar100
Add a standard implementation of the model for the CIFAR-100 task. The dataset can be found in fedjax.datasets.cifar100.
For the model architecture, we should follow “Adaptive Federated Optimization”. The model architecture is detailed in section 4 as a ResNet-18 (replacing batch norm with group norm). Code for this paper and a Keras implementation of the model can be found here. We suggest using either haiku or flax to implement the model for use with JAX.
If you choose to use haiku, you can use fedjax.create_model_from_haiku to create a fedjax compatible model. If you choose to use flax, wrapping it in a fedjax.Model is fairly straightforward and we can provide guidance for this.
A good example to follow is #265 that checks in a simple linear model for CIFAR-100 and includes the model implementation, tests, and baseline results with FedAvg using this script. Make sure to add a flags file similar to https://github.com/google/fedjax/blob/main/experiments/fed_avg/fed_avg.CIFAR100_LOGISTIC.flags and add the new task to https://github.com/google/fedjax/blob/main/fedjax/training/tasks.py.
Thanks for your contributions!
Support for haiku models with non-trainable state
Hi!
congrats on this great library! I've started using it a few days ago and I love it!
Is there any way to use a haiku model with a non-trainable state (e.g. to use batch norm)?
I didn't find any nontrivial way, but maybe I'm missing something.
Thanks a lot for your help!
External contributions?
Hello. Thanks for open sourcing this library! I'm wondering if this repository is open to external contributions? If yes, I'd be interested to send a PR and contribute an implementation of FedPA. Thanks.
FedJax depends on TensorFlow Federated?
I am helping users install FedJax for use in their federated learning research projects and I noticed that installing FedJax is pulling in TensorFlow Federated (0.17) and TensorFlow (2.3). I don't see either of these listed as dependencies of FedJax so I am trying to understand why they are being pulled in by pip install fedjax
.
Feature request: Convert standard dataset into a federated dataset
Synthetic federated datasets can constructed from standard centralized ones by artificially splitting them among clients. This is usually done using a Dirichlet distribution (e.g. Hsu et al. 2019).
Such synthetic datasets are very useful since we can explicitly control the total number of users, as well as the heterogeneity.
It would be great to have primitives which can automatically convert standard numpy dataset into a FedJax datset.
Centralized (server-only) algorithm
Add functionality which a single update step on the clients using their entire batch. This replicates running centralized algorithms on federated datasets.
Support for manually modifying client/server learning rate
Hi,
I'm playing around with clients learning rate but I cannot find a clean way of modifying it.
Basically, I need to change the LR following a schedule based on the current round.
Is that possible?
Thanks
How to create a validation dataset?
Hello!
I may need to split each client's train dataset into train and validation parts for grid search purposes (for example, tuning the stepsizes in a method). How can this be achieved in the framework?
CIFAR 100 Questions
Hi, thanks for the awesome library! I want to ask a couple of questions related to CIFAR100 datasets.
- I noticed that while the dataset is available in the library, the model is not. Curious if a model for CIFAR100 is work-in-progress, or if there is no short-term plan for this?
- Looking at the CIFAR100 dataset, this seems to be inconsistent with Google's TFF. Notably, the cropping size and normalizing are done differently from TFF. Is this intentional? Would it be correct to say that we could expect this to mirror TFF's design eventually?
Thanks in advance for all the help!
Problem of Quick Start in Readme.md
I tried to run the code in the QuickStart and I found some problems.
federated_data = fedjax.FederatedData()
can not be executed because it is an abstract class. So I replaced it as
client_a_data = {
'x': np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]),
'y': np.array([7, 8])
}
client_b_data = {'x': np.array([[9.0, 10.0, 11.0]]), 'y': np.array([12])}
client_to_data_mapping = {'a': client_a_data, 'b': client_b_data}
federated_data = fedjax.InMemoryFederatedData(client_to_data_mapping)
The other things are same as the QuickStart, but i got an error
for client_id, client_output, _ in func(shared_input, clients):
for client_id, client_batches, client_input in clients:
ValueError: not enough values to unpack (expected 3, got 2)
It seems that client_batches
is missing and we need to batch the dataset, but there is no example which fits this situation.
Full EMNIST example does not exhibit parallelization
Hi! I am facing an issue with parallelizing the base code provided by the developers.
- My local workstation contains two GPUs.
- I installed FedJax in a conda environment
- I downloaded "emnist_fed_avg.py" file from the folder "examples", deleted the "fedjax.training.set_tf_cpu_only()" line and replaced
fed_avg.federated_averaging
tofedjax.algorithms.fed_avg.federated_averaging
on line 61 - Having activated the conda environment, I ran the file with
python emnist_fed_avg.py
. The file runs correctly and prints the expected output (round nums and train/test metrics on each 10th round) - The
nvidia-smi
command shows zero percent utilization and almost zero memory usage on one of the GPUs (and ~40% utilization/maximum memory usage on another node)
Any ideas what I am doing wrong?
Support for gldv2 and inaturalist datasets
I think it would be great to port these datasets from tff to fedjax.
I would be happy to make the effort and contribute to the library, but I need a bit of support from the fedjax team 🙂
By looking at the tff codebase (gldv2, inaturalist) it looks that load_data_from_cache function creates a tfrecords file for each client.
The only concrete classes that I see are SQLiteFederatedData
and InMemoryFederatedData
, but I don't think they are meant for this use case. What would be the best way to map the clients into a FederatedDataset
?
We could replicate something like FilePerUserClientData.
Thanks!
Clarifying the meaning of "weight"
In the Intro notebook, the backward_pass_output
from model.backward
has a weight
feature.
It seems to me that this is used for performing a weighted averaging in FedAvg, but this is not clear to me how. Perhaps this can be renamed to batch_size
?
Error of the Stackoverflow Tokernizer example
TensorFlow version: 2.5.3
fedjax version: 0.0.16
jax version: 0.4.8
When I follow the docs (https://fedjax.readthedocs.io/en/latest/fedjax.datasets.html#fedjax.datasets.stackoverflow.load_data) to process the Stackoverflow dataset by using
from fedjax.datasets import stackoverflow
# Load partially preprocessed splits.
train, held_out, test = stackoverflow.load_data(cache_dir='../data')
# Apply tokenizer during batching.
Tokenizer = stackoverflow.StackoverflowTokenizer()
train_max_length, eval_max_length = 20, 30
train_for_train = train.preprocess_batch(
tokenizer.as_preprocess_batch(train_max_length))
train_for_eval = train.preprocess_batch(
tokenizer.as_preprocess_batch(eval_max_length))
It has the following error:
2023-05-06 23:46:33.460149: W tensorflow/core/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "Not found: Could not locate the credentials file.". Retrieving token from GCE failed with "Failed precondition: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata".
Traceback (most recent call last):
File "test.py", line 26, in <module>
tokenizer = stackoverflow.StackoverflowTokenizer()
File "/home/yik/anaconda2/envs/fl/lib/python3.8/site-packages/fedjax/datasets/stackoverflow.py", line 185, in __init__
self._table = tf.lookup.StaticVocabularyTable(
File "/home/yik/anaconda2/envs/fl/lib/python3.8/site-packages/tensorflow/python/ops/lookup_ops.py", line 1255, in __init__
raise TypeError("Invalid key dtype, expected one of %s, but got %s." %
TypeError: Invalid key dtype, expected one of (tf.int64, tf.string), but got <dtype: 'float32'>.
Exception ignored in: <function CapturableResource.__del__ at 0x2b2156f4c040>
Traceback (most recent call last):
File "/home/yik/anaconda2/envs/fl/lib/python3.8/site-packages/tensorflow/python/training/tracking/tracking.py", line 269, in __del__
with self._destruction_context():
AttributeError: 'StaticVocabularyTable' object has no attribute '_destruction_context'
Could you please help fix this?
PatentMime^
https://github.com/tensorflow/federated/issues/1950
🧾 works well for sort(ing) new patent(s) comparably with the older suite. Hath structural kinks, wh0 doesn’t er which’s worry, 🎗️ ;]
Add support for stateful clients
At this moment I don't see how to implement a fedjax.FederatedAlgorithm
with stateful clients. Which would be necessary to implement personalised federated algorithms. It would be great to include an example similar to the one e.g. in TensorflowFederated.
Implementing SCAFFOLD
It might be a good idea to have an implementation of SCAFFOLD as well in the algorithms. I think this can be done by modifying the existing Mime implementation.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.