makgyver / gossipy Goto Github PK

View Code? Open in Web Editor NEW

32.0 1.0 12.0 2.13 MB

Python module for simulating gossip learning.

License: Apache License 2.0

Python 97.31% Makefile 0.31% Batchfile 0.36% CSS 0.05% TeX 1.97%

gossip-learning gossip-protocol federated-learning simulation pytorch distributed-machine-learning deep-learning

gossipy's People

Contributors

Stargazers

Watchers

Forkers

yang-zheming vishalbelsare purushoath02 bunnyaqbin phamalext yacinebelal bonjon mohamedlegh greengrass19000 ceruninco otouat

gossipy's Issues

Issue Getting the code to utilize google collab GPU

Hi, I'm Parsa a master's student at the university of Semnan.
I am trying to get gossipy (particularly main_onoszko_2021.py) to run In google collab. although successful; it takes long training times and shows 0 utilization of GPU . that is when CUDA is available for PyTorch. and also GPU is enabled in collab. let me know if you wanna take a look at my code I will email it to you.
btw I have tried solutions like (numba @jit(target_backend='cuda') ). if you can put me in the right path. may tell me how you managed to get it to use GPU? I will be grateful

TPU support for gossipy

I recently attempted to bring TPU support to gossipy. however, when using it it shows 4 hours and 30 minutes to completion of my simulation, significantly slower than the GPU training time of approximately 45 minutes. I was wondering if you could make it work:
I have read that TPUs are supposed to be significantly faster but my attempt does not reflect that!
Here is how I changed stuff :
first installing torch_XLA:

!pip install torch_xla https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-2.0-cp310-cp310-linux_x86_64.whl

import torch_xla.core.xla_model as xm

class GlobalSettings(metaclass=Singleton):
    """Global settings for the library.""" 
    
    _device = 'cpu'

    def auto_device(self) -> torch.device:
        """Set device to TPU if available, otherwise cuda if available, otherwise cpu.
        
        Returns
        -------
        torch.device
            The device.
        """
        if xm.xla_device_exists():
            self._device = xm.xla_device()
        elif torch.cuda.is_available():
            self._device = torch.device('cuda')
        else:
            self._device = torch.device('cpu')
        return self._device
    
    def set_device(self, device_name: str) -> torch.device:
        """Set the device.
    
        Parameters
        ----------
        device_name: name of the device to set (possible values are 'auto', 'cuda', 'cpu', and 'tpu').
        When device_name is 'auto', 'cuda' is used if available, otherwise 'cpu'.
        
        Returns
        -------
        torch.device
            The device.
        """

        if device_name == "auto":
            return GlobalSettings().auto_device()
        elif device_name == "tpu" and xm.xla_device():
            self._device = xm.xla_device()
        else:
            self._device = torch.device(device_name)
        
        return self._device
    
    def get_device(self):
        """Get the device.

        Returns
        -------
        torch.device
            The device.
        """
        return self._device

*****TPU

***********GPU

Same code for Linear Regression and Logistic Regression ?

Hello,
I'm new to machine learning but it seems to me that the implementation of Logistic Regression is the same as Linear Regression, it's simply a linear transformation: https://github.com/makgyver/gossipy/blob/main/gossipy/model/nn.py#L166

I would have suppose that we will have a sigmoid transformation, with the torch.sigmoid function, for example like in this tutorial : https://towardsdatascience.com/logistic-regression-with-pytorch-3c8bbea594be

Is it a bug or did I miss something ?

Thanks,
Mohamed

Why use PyTorch for simple machine learning models (linear regression, logistic regression, ...) instead of scikit-learn ?

Hello,
I was curious to know why you implement all the machine learning models in PyTorch instead of another library like scikit-learn ? Is it because it's easier to have all the models implemented in the same way ? Do you have plan to be compatible with other machine library in the future ?

Thanks,
Mohamed Amine

Issue when the network is composed of one node

I know that it doesn't make sense to create a network with one node but I did it to compare results with a centralized solution.
This is the code used (adapted from main_ormandi.py:

from gossipy.gossipy import set_seed
from gossipy.gossipy.core import AntiEntropyProtocol, CreateModelMode, StaticP2PNetwork, UniformDelay
from gossipy.gossipy.node import GossipNode
from gossipy.gossipy.model.handler import PegasosHandler
from gossipy.gossipy.model.nn import AdaLine
from gossipy.gossipy.data import load_classification_dataset, DataDispatcher
from gossipy.gossipy.data.handler import ClassificationDataHandler
from gossipy.gossipy.simul import GossipSimulator, SimulationReport
from gossipy.gossipy.utils import plot_evaluation

set_seed(42)
X, y = load_classification_dataset("spambase", as_tensor=True)
y = 2*y - 1 #convert 0/1 labels to -1/1

data_handler = ClassificationDataHandler(X, y, test_size=.1)
data_dispatcher = DataDispatcher(data_handler, 1, eval_on_user=False, auto_assign=True)
topology = StaticP2PNetwork(1, None)
model_handler = PegasosHandler(net=AdaLine(data_handler.size(1)),
                               learning_rate=.01,
                               create_model_mode=CreateModelMode.MERGE_UPDATE)

# For loop to repeat the simulation
nodes = GossipNode.generate(data_dispatcher=data_dispatcher,
                            p2p_net=topology,
                            model_proto=model_handler,
                            round_len=100,
                            sync=False)

simulator = GossipSimulator(
    nodes=nodes,
    data_dispatcher=data_dispatcher,
    delta=100,
    protocol=AntiEntropyProtocol.PUSH,
    delay=UniformDelay(0,10),
    online_prob=.2, #Approximates the average online rate of the STUNner's smartphone traces
    drop_prob=.1, #Simulate the possibility of message dropping,
    sampling_eval=.1
)

report = SimulationReport()
simulator.add_receiver(report)
simulator.init_nodes(seed=42)
simulator.start(n_rounds=100)

plot_evaluation([[ev for _, ev in report.get_evaluation(False)]], "Overall test results")

And the results I get back are the following :

# INFO     # Sent messages: 95                                                                                   simul.py:248
# INFO     # Failed messages: 79                                                                                 simul.py:249
# INFO     Total size: 5415                                                                                      simul.py:250
# INFO     accuracy: 0.62                                                                                        utils.py:181
# INFO     precision: 0.62                                                                                       utils.py:181
# INFO     recall: 0.62                                                                                          utils.py:181
# INFO     f1_score: 0.61                                                                                        utils.py:181
# INFO     auc: 0.66                                                                                             utils.py:181

In the algorithm from the paper ormandi2013, a node wait for the reception of a model to update the model on the data, and thus the model never learn. This is a bit sad but expected. The part that is very strange for me is the number of "failed "message".
I would have suppose that all message would failed since there is no other nodes in the network.
What do you think ? Is this expected or a bug ?

Are you open to contributions ?

Hello,
I'm a PhD student at Sorbonne University and I've started to look at gossip learning. I've just discovered your simulator and I think I will use it for my work.
If I add new protocols or algorithms to the simulator, are you open that I add them to the project as pull requests ?

Thanks,
Mohamed Amine LEGHERABA

Questions about the implementation of an algorithm

Hi,
First thank you for your good work.
I have a question regarding the class TorchModelPartition implemented here:

gossipy/gossipy/model/sampling.py

Lines 110 to 234 in 5ae94ab

 class TorchModelPartition: 

 def __init__(self, net_proto: TorchModel, n_parts: int): 

 """Class that manages the partitioning of a pytorch model. 

  TorchModelPartition handles how to partition a pytorch model as well as the merge of  

  partitioned models. The partitioning is deterministic. It divides the parameters of the  

  model in ``n_parts`` parts of equal size starting from the first layer and going to the  

  last layer. 

  The created partition is stored in the ``partitions`` attribute which is a dictionary 

  containing the indices of the parameters to be sampled in each layer. 

  Parameters 

  ---------- 

  net_proto : TorchModel 

  The prototype of the model to be partitioned. 

  n_parts : int 

  The number of partitions to be created. 

  Notes 

  ----- 

  Partitioning is only supported for neural networks with at most 3D layers. 

  """ 

 self._check(net_proto) 

 self.str_arch = str(net_proto) 

 self.n_parts = min(n_parts, net_proto.get_size()) 

 self.partitions = self._partition(net_proto, self.n_parts) 

 def _check(self, net: TorchModel) -> None: 

 plist = ParameterList(net.parameters()) 

 for t in plist: 

 if t.dim() > 3: 

 raise TypeError("Partitioning is only supported for neural "\ 

 "networks with at most 3D layers.") 

 def _partition(self, 

 net: TorchModel, 

 n: int) -> Dict[int, Dict[int, Optional[Tuple[LongTensor, ...]]]]: 

 plist = ParameterList(net.parameters()) 

 parts = {i : {j : None for j in range(len(plist))} for i in range(n)} 

 net_size = net.get_size() 

 mu = math.floor(net_size / n) 

 rem = net_size % n 

 ni, ti = 0, 0 

 diff = mu + (rem > 0) 

 shift = [0, 0, 0] 

 ids = [[], [], []] 

 while ti < len(plist): 

 tensor = plist[ti] 

 sizes = tuple(tensor.shape) 

 cover = min(sizes[0] - shift[0], diff) 

 diff -= cover 

 ids[0].extend(range(shift[0], shift[0]+cover)) 

 if tensor.dim() >= 2: ids[1].extend([shift[1]] * cover) 

 if tensor.dim() >= 3: ids[2].extend([shift[2]] * cover) 

 shift[0] = (shift[0] + cover) % sizes[0] 

 if not shift[0] and tensor.dim() >= 2: shift[1] = (shift[1] + 1) % sizes[1] 

 if not shift[1] and tensor.dim() >= 3: shift[2] = (shift[2] + 1) % sizes[2] 

 if tensor.dim() == 1: 

 if diff == 0 or shift[0] == 0: 

 parts[ni][ti] = (torch.LongTensor(ids[0]),) 

 ids = [[], [], []] 

 elif tensor.dim() == 2: 

 if diff == 0 or shift[1] == 0: 

 parts[ni][ti] = (torch.LongTensor(ids[0]), 

 torch.LongTensor(ids[1])) 

 ids = [[], [], []] 

 else:#if tensor.dim() == 3: 

 if diff == 0 or shift[2] == 0: 

 parts[ni][ti] = (torch.LongTensor(ids[0]), 

 torch.LongTensor(ids[1]), 

 torch.LongTensor(ids[2])) 

 ids = [[], [], []] 

 if shift[0] == 0: 

 if tensor.dim() == 1: ti += 1 

 else: 

 if shift[1] == 0: 

 if tensor.dim() == 2: ti += 1 

 elif shift[2] == 0: ti += 1 

 if diff == 0: 

 ni += 1 

 diff = mu 

 if ni < rem: diff += 1 

 return parts 

 def merge(self, id_part: int, 

 net1: TorchModel, 

 net2: TorchModel, 

 weights: Optional[Tuple[int, int]]=None) -> None: 

 """Merges the partition with id ``id_part`` of two models. 

  Parameters 

  ---------- 

  id_part : int 

  The index of the partition to be merged. 

  net1 : TorchModel 

  The first model to be merged. 

  net2 : TorchModel  

  The second model to be merged. 

  weights : Optional[Tuple[int, int]], default=None 

  This tuple represents the relative weights of the two models to be merged. 

  If None, the weights are assumed to be equal, thus the merge is the average of the  

  parameters. 

  """ 

 assert str(net1) == self.str_arch, "net1 is not compatible." 

 assert str(net2) == self.str_arch, "net2 is not compatible." 

 id_part = id_part % self.n_parts 

 plist1 = ParameterList(net1.parameters()) 

 plist2 = ParameterList(net2.parameters()) 

 w = weights if (weights is not None and weights != (0,0)) else (1,1) 

 mul1, mul2 = w[0] / sum(w), w[1] / sum(w) 

 with torch.no_grad(): 

 for i in range(len(plist1)): 

 t_ids = self.partitions[id_part][i] 

 if t_ids is not None: 

 plist1[i][t_ids] = mul1 * plist1[i][t_ids] + mul2 * plist2[i][t_ids]

The method _check checks whether a tensor is of dimension 3 or more and _partition suppose that tensors are of dimension less or equal to 3 (as _check is called before). Why this limitation? It looks like the code of _partition can be extended to support tensors of dimension 4 but, maybe I am missing something?.
This current limitation prevents us to use LeNet-5 implemented here for example: https://github.com/lychengrex/LeNet-5-Implementation-Using-Pytorch/blob/master/LeNet-5%20Implementation%20Using%20Pytorch.ipynb

Sincerely,

makgyver / gossipy Goto Github PK

gossipy's People

Contributors

Stargazers

Watchers

Forkers

gossipy's Issues

Issue Getting the code to utilize google collab GPU

TPU support for gossipy

Same code for Linear Regression and Logistic Regression ?

Why use PyTorch for simple machine learning models (linear regression, logistic regression, ...) instead of scikit-learn ?

Issue when the network is composed of one node

Are you open to contributions ?

Questions about the implementation of an algorithm

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	class TorchModelPartition:
	def __init__(self, net_proto: TorchModel, n_parts: int):
	"""Class that manages the partitioning of a pytorch model.

	TorchModelPartition handles how to partition a pytorch model as well as the merge of
	partitioned models. The partitioning is deterministic. It divides the parameters of the
	model in ``n_parts`` parts of equal size starting from the first layer and going to the
	last layer.
	The created partition is stored in the ``partitions`` attribute which is a dictionary
	containing the indices of the parameters to be sampled in each layer.

	Parameters
	----------
	net_proto : TorchModel
	The prototype of the model to be partitioned.
	n_parts : int
	The number of partitions to be created.

	Notes
	-----
	Partitioning is only supported for neural networks with at most 3D layers.
	"""
	self._check(net_proto)
	self.str_arch = str(net_proto)
	self.n_parts = min(n_parts, net_proto.get_size())
	self.partitions = self._partition(net_proto, self.n_parts)

	def _check(self, net: TorchModel) -> None:
	plist = ParameterList(net.parameters())
	for t in plist:
	if t.dim() > 3:
	raise TypeError("Partitioning is only supported for neural "\
	"networks with at most 3D layers.")

	def _partition(self,
	net: TorchModel,
	n: int) -> Dict[int, Dict[int, Optional[Tuple[LongTensor, ...]]]]:
	plist = ParameterList(net.parameters())
	parts = {i : {j : None for j in range(len(plist))} for i in range(n)}
	net_size = net.get_size()
	mu = math.floor(net_size / n)
	rem = net_size % n
	ni, ti = 0, 0
	diff = mu + (rem > 0)
	shift = [0, 0, 0]
	ids = [[], [], []]
	while ti < len(plist):
	tensor = plist[ti]
	sizes = tuple(tensor.shape)
	cover = min(sizes[0] - shift[0], diff)
	diff -= cover

	ids[0].extend(range(shift[0], shift[0]+cover))
	if tensor.dim() >= 2: ids[1].extend([shift[1]] * cover)
	if tensor.dim() >= 3: ids[2].extend([shift[2]] * cover)

	shift[0] = (shift[0] + cover) % sizes[0]
	if not shift[0] and tensor.dim() >= 2: shift[1] = (shift[1] + 1) % sizes[1]
	if not shift[1] and tensor.dim() >= 3: shift[2] = (shift[2] + 1) % sizes[2]

	if tensor.dim() == 1:
	if diff == 0 or shift[0] == 0:
	parts[ni][ti] = (torch.LongTensor(ids[0]),)
	ids = [[], [], []]
	elif tensor.dim() == 2:
	if diff == 0 or shift[1] == 0:
	parts[ni][ti] = (torch.LongTensor(ids[0]),
	torch.LongTensor(ids[1]))
	ids = [[], [], []]
	else:#if tensor.dim() == 3:
	if diff == 0 or shift[2] == 0:
	parts[ni][ti] = (torch.LongTensor(ids[0]),
	torch.LongTensor(ids[1]),
	torch.LongTensor(ids[2]))
	ids = [[], [], []]

	if shift[0] == 0:
	if tensor.dim() == 1: ti += 1
	else:
	if shift[1] == 0:
	if tensor.dim() == 2: ti += 1
	elif shift[2] == 0: ti += 1

	if diff == 0:
	ni += 1
	diff = mu
	if ni < rem: diff += 1

	return parts


	def merge(self, id_part: int,
	net1: TorchModel,
	net2: TorchModel,
	weights: Optional[Tuple[int, int]]=None) -> None:
	"""Merges the partition with id ``id_part`` of two models.

	Parameters
	----------
	id_part : int
	The index of the partition to be merged.
	net1 : TorchModel
	The first model to be merged.
	net2 : TorchModel
	The second model to be merged.
	weights : Optional[Tuple[int, int]], default=None
	This tuple represents the relative weights of the two models to be merged.
	If None, the weights are assumed to be equal, thus the merge is the average of the
	parameters.
	"""

	assert str(net1) == self.str_arch, "net1 is not compatible."
	assert str(net2) == self.str_arch, "net2 is not compatible."

	id_part = id_part % self.n_parts
	plist1 = ParameterList(net1.parameters())
	plist2 = ParameterList(net2.parameters())

	w = weights if (weights is not None and weights != (0,0)) else (1,1)
	mul1, mul2 = w[0] / sum(w), w[1] / sum(w)
	with torch.no_grad():
	for i in range(len(plist1)):
	t_ids = self.partitions[id_part][i]
	if t_ids is not None:
	plist1[i][t_ids] = mul1 * plist1[i][t_ids] + mul2 * plist2[i][t_ids]